Oracle Architecture
New Term: The Oracle Relational Database
Management System, or RDBMS, is designed to allow simultaneous access to large
amounts of stored information. The RDBMS consists of the database (the
information) and the instance (the embodiment of the system). The database
contains the physical files that reside on the system and the logical pieces
such as the database schema. These database files take various forms, as
described in the following section. The instance is the method used to access
the data and consists of processes and system memory.
--------------------------------------------------------------------------------
NOTE: Object extensions have been added
to the RDBMS with Oracle8. The object extension to tables is covered in detail
on Day 12, "Working with Tables, Views, and Synonyms." Oracle refers
to Oracle8 as an O-RDBMS (Object-Relational Database Management System). In
this book, I refer to Oracle as an RDBMS for clarity.
--------------------------------------------------------------------------------
The Database
The Oracle database has a logical layer
and a physical layer. The physical layer consists of the files that reside on
the disk; the components of the logical layer map the data to these physical
components.
The Physical Layer
The physical layer of the database
consists of three types of files:
One or more datafiles--Datafiles
store the information contained in the database. You can have as few as one
datafile or as many as hundreds of datafiles. The information for a single
table can span many datafiles or many tables can share a set of datafiles.
Spreading tablespaces over many datafiles can have a significant positive
effect on performance. The number of datafiles that can be configured is
limited by the Oracle parameter MAXDATAFILES.
Two or more redo log files--Redo log
files hold information used for recovery in the event of a system failure. Redo
log files, known as the redo log, store a log of all changes made to the
database. This information is used in the event of a system failure to reapply
changes that have been made and committed but that might not have been made to
the datafiles. The redo log files must perform well and be protected against
hardware failures (through software or hardware fault tolerance). If redo log
information is lost, you cannot recover the system.
One or more control files--Control
files contain information used to start an instance, such as the location of datafiles
and redo log files; Oracle needs this information to start the database
instance. Control files must be protected. Oracle provides a mechanism for
storing multiple copies of control files.
The Logical Layer
The logical layer of the database consists
of the following elements:
One or more tablespaces.
The database schema, which consists of
items such as tables, clusters, indexes, views, stored procedures, database
triggers, sequences, and so on.
Tablespaces and Datafiles
New Term: The database is divided into
one or more logical pieces known as tablespaces. A tablespace is used to
logically group data together. For example, you can create one tablespace for
accounting and a separate tablespace for purchasing. Segmenting groups into
different tablespaces simplifies the administration of these groups (see Figure
2.1). Tablespaces consist of one or more datafiles. By using more than one
datafile per tablespace, you can spread data over many different disks to
distribute the I/O load and improve performance.
The relationship between the database, tablespaces, and datafiles.
As part of the process of creating the
database, Oracle automatically creates the SYSTEM tablespace for you. Although
a small database can fit within the SYSTEM tablespace, it's recommended that
you create a separate tablespace for user data. The SYSTEM tablespace is where
the data dictionary is kept. The data dictionary contains information about
tables, indexes, clusters, and so on.
Datafiles can be operating system files
or, in the case of some operating systems, RAW devices. Datafiles and data
access methods are described in detail on Day 12.
The Database Schema
New Term: The database schema is a
collection of logical-structure objects, known as schema objects, that define
how you see the database's data. These schema objects consist of structures
such as tables, clusters, indexes, views, stored procedures, database triggers,
and sequences.
Table--A table, which consists of
a tablename and rows and columns of data, is the basic logical storage unit in
the Oracle database. Columns are defined by name and data type. A table is
stored within a tablespace; often, many tables share a tablespace.
Cluster--A cluster is a set of
tables physically stored together as one table that shares a common column. If
data in two or more tables is frequently retrieved together based on data in
the common column, using a clustered table can be quite efficient. Tables can
be accessed separately even though they are part of a clustered table. Because
of the structure of the cluster, related data requires much less I/O overhead
if accessed simultaneously.
Index--An index is a structure
created to help retrieve data more quickly and efficiently (just as the index
in this book allows you to find a particular section more quickly). An index is
declared on a column or set of columns. Access to the table based on the value
of the indexed column(s) (as in a WHERE clause) will use the index to locate
the table data.
--------------------------------------------------------------------------------
NOTE: A new feature in Oracle8 is the
index-only table. In an index-only table, the data and index are stored
together. This is discussed in detail on Day 13, "Using Indexes and
Sequences."
--------------------------------------------------------------------------------
View--A view is a window into
one or more tables. A view does not store any data; it presents table data. A
view can be queried, updated, and deleted as a table without restriction. Views
are typically used to simplify the user's perception of data access by
providing limited information from one table, or a set of information from
several tables transparently. Views can also be used to prevent some data from
being accessed by the user or to create a join from multiple tables.
Stored procedure--A stored
procedure is a predefined SQL query that is stored in the data dictionary.
Stored procedures are designed to allow more efficient queries. Using stored
procedures, you can reduce the amount of information that must be passed to the
RDBMS and thus reduce network traffic and improve performance.
Database trigger--A
database trigger is a procedure that is run automatically when an event occurs.
This procedure, which is defined by the administrator or developer, triggers,
or is run whenever this event occurs. This procedure could be an insert, a
deletion, or even a selection of data from a table.
Sequence--The Oracle sequence
generator is used to automatically generate a unique sequence of numbers in
cache. By using the sequence generator you can avoid the steps necessary to
create this sequence on your own such as locking the record that has the last
value of the sequence, generating a new value, and then unlocking the record.
Segments, Extents, and Data Blocks
Within Oracle, the space used to store
data is controlled by the use of logical structures. These structures consist
of the following:
Data blocks--A block is the
smallest unit of storage in an Oracle database. The database block contains
header information concerning the block itself as well as the data.
Extents--Extents consist of data
blocks.
Segments--A segment is a set of
extents used to store a particular type of data, as shown in Figure 2.2.
Segments, extents, and data blocks.
Segments
An Oracle database can use four types of
segments:
Data segment--Stores user data
within the database.
Index segment--Stores indexes.
Rollback segment--Stores
rollback information used when data must be rolled back.
Temporary segment--Created
when a SQL statement needs a temporary work area; these segments are destroyed
when the SQL statement is finished. These segments are used during various
database operations, such as sorts.
Extents
Extents are the building blocks of segments; in turn, they consist of data blocks. An extent is used to
minimize the amount of wasted (empty) storage. As more and more data is entered
into tablespaces in your database, the extents used to store that data can grow
or shrink as necessary. In this manner, many tablespaces can share the same
storage space without preallocating the divisions between those tablespaces.
At tablespace-creation time, you can
specify the minimum number of extents to allocate as well as the number of
extents to add at a time when that allocation has been used. This arrangement
gives you efficient control over the space used in your database.
Data Blocks
Data blocks are the smallest pieces of an Oracle database; they are physically stored on disk. Although the data block in
most systems is 2KB (2,048 bytes), you can change this size for efficiency
depending on your application or operating system.
--------------------------------------------------------------------------------
NOTE: Oracle blocks do not need to be,
and may not be the same as, operating system data blocks. In fact, in most
cases they are not.
--------------------------------------------------------------------------------
The Oracle Instance
The Oracle instance consists of the
Oracle processes and shared memory necessary to access information in the
database. The instance is made up of the user processes, the Oracle background
processes, and the shared memory used by these processes (see Figure 2.3).
The Oracle Memory Structure
New Term: Oracle uses shared memory for
several purposes, including caching of data and indexes as well as storing
shared program code. This shared memory is broken into various pieces, or
memory structures. The basic memory structures associated with Oracle are the
System Global Area (SGA) and the Program Global Area (PGA).
The Oracle instance.
The System Global Area (SGA)
The SGA is a shared memory region that
Oracle uses to store data and control information for one Oracle instance. The
SGA is allocated when the Oracle instance starts and deallocated when the
Oracle instance shuts down. Each Oracle instance that starts has its own SGA.
The information in the SGA consists of the following elements, each of which
has a fixed size and is created at instance startup:
The database buffer cache--This
stores the most recently used data blocks. These blocks can contain modified
data that has not yet been written to disk (sometimes known as dirty blocks),
blocks that have not been modified, or blocks that have been written to disk
since modification (sometimes known as clean blocks). Because the buffer cache
keeps blocks based on a most recently used algorithm, the most active buffers
stay in memory to reduce I/O and improve performance.
The redo log buffer--This
stores redo entries, or a log of changes made to the database. The redo log
buffers are written to the redo log as quickly and efficiently as possible.
Remember that the redo log is used for instance recovery in the event of a
system failure.
The shared pool--This is
the area of the SGA that stores shared memory structures such as shared SQL
areas in the library cache and internal information in the data dictionary. The
shared pool is important because an insufficient amount of memory allocated to
the shared pool can cause performance degradation. The shared pool consists of
the library cache and the data-dictionary cache.
The Library Cache
The library cache is used to store
shared SQL. Here the parse tree and the execution plan for every unique SQL
statement are cached. If multiple applications issue the same SQL statement,
the shared SQL area can be accessed by each to reduce the amount of memory
needed and to reduce the processing time used for parsing and execution
planning.
The Data-Dictionary Cache
The data dictionary contains a set of
tables and views that Oracle uses as a reference to the database. Oracle stores
information here about the logical and physical structure of the database. The
data dictionary contains information such as the following:
User information, such as user privileges
Integrity constraints defined for tables
in the database
Names and data types of all columns in
database tables
Information on space allocated and used
for schema objects
The data dictionary is frequently
accessed by Oracle for the parsing of SQL statements. This access is essential
to the operation of Oracle; performance bottlenecks in the data dictionary
affect all Oracle users. Because of this, you should make sure that the
data-dictionary cache is large enough to cache this data. If you do not have
enough memory for the data-dictionary cache, you see a severe performance
degredation. If you ensure that you have allocated sufficient memory to the
shared pool where the data-dictionary cache resides, you should see no
performance problems.
The Program Global Area (PGA)
The PGA is a memory area that contains
data and control information for the Oracle server processes. The size and
content of the PGA depends on the Oracle server options you have installed.
This area consists of the following components:
Stack space--This is the memory
that holds the session's variables, arrays, and so on.
Session information--If you
are not running the multithreaded server, the session information is stored in
the PGA. If you are running the multithreaded server, the session information
is stored in the SGA.
Private SQL area--This is
an area in the PGA where information such as binding variables and runtime
buffers is kept.
Processes
New Term: In many operating systems,
traditional processes have been replaced by threads or lightweight processes.
The term process is used in this book to describe a thread of execution, or a
mechanism that can execute a set of code; process refers to the mechanism of
execution and can refer to a traditional process or a thread.
The Oracle RDBMS uses two types of
processes: user processes and Oracle processes (also known as background
processes). In some operating systems (such as Windows NT), these processes are
actually threads; for the sake of consistency, I will refer to them as
processes.
User Processes
User, or client, processes are the
user's connections to the RDBMS system. The user process manipulates the user's
input and communicates with the Oracle server process through the Oracle
program interface. The user process is also used to display the information
requested by the user and, if necessary, can process this information into a
more useful form.
Oracle Processes
Oracle processes perform functions for
users. Oracle processes can be split into two groups: server processes (which
perform functions for the invoking process) and background processes (which
perform functions on behalf of the entire RDBMS).
Server Processes (Shadow Processes)
Server processes, also known as shadow
processes, communicate with the user and interact with Oracle to carry out the
user's requests. For example, if the user process requests a piece of data not
already in the SGA, the shadow process is responsible for reading the data
blocks from the datafiles into the SGA. There can be a one-to-one correlation
between user processes and shadow processes (as in a dedicated server
configuration); although one shadow process can connect to multiple user
processes (as in a multithreaded server configuration), doing so reduces the
utilization of system resources.
Background Processes
Background processes are used to perform
various tasks within the RDBMS system. These tasks vary from communicating with
other Oracle instances and performing system maintenance and cleanup to writing
dirty blocks to disk. Following are brief descriptions of the nine Oracle
background processes:
DBWR (Database Writer)--DBWR is
responsible for writing dirty data blocks from the database block buffers to
disk. When a transaction changes data in a data block, that data block need not
be immediately written to disk. Therefore, the DBWR can write this data to disk
in a manner that is more efficient than writing when each transaction
completes. The DBWR usually writes only when the database block buffers are
needed for data to be read. Data is written in a least recently used fashion.
For systems in which asynchronous I/O (AIO) is available, there should be only
one DBWR process. For systems in which AIO is not available, performance can be
greatly enhanced by adding more DBWR processes.
LGWR (Log Writer)--The LGWR process
is responsible for writing data from the log buffer to the redo log.
CKPT (Checkpoint)--The CKPT
process is responsible for signaling the DBWR process to perform a checkpoint
and to update all the datafiles and control files for the database to indicate
the most recent checkpoint. A checkpoint is an event in which all modified
database buffers are written to the datafiles by the DBWR. The CKPT process is
optional. If the CKPT process is not present, the LGWR assumes these
responsibilities.
PMON (Process Monitor)--PMON is
responsible for keeping track of database processes and cleaning up if a
process prematurely dies (PMON cleans up the cache and frees resources that
might still be allocated). PMON is also responsible for restarting any
dispatcher processes that might have failed.
SMON (System Monitor)--SMON
performs instance recovery at instance startup. This includes cleaning
temporary segments and recovering transactions that have died because of a
system crash. The SMON also defragments the database by coalescing free extents
within the database.
RECO (Recovery)--RECO is
used to clean transactions that were pending in a distributed database. RECO is
responsible for committing or rolling back the local portion of the disputed
transactions.
ARCH (Archiver)--ARCH is
responsible for copying the online redo log files to archival storage when they
become full. ARCH is active only when the RDBMS is operated in ARCHIVELOG mode.
When a system is not operated in ARCHIVELOG mode, it might not be possible to
recover after a system failure. It is possible to run in NOARCHIVELOG mode
under certain circumstances, but typically should operate in ARCHIVELOG mode.
LCKn (Parallel Server Lock)--Up to 10
LCK processes are used for interinstance locking when the Oracle Parallel
Server option is used.
Dnnn (Dispatcher)--When the
Multithreaded Server option is used, at least one Dispatcher process is used
for every communications protocol in use. The Dispatcher process is responsible
for routing requests from the user processes to available shared server
processes and back.
How Transactions Work
New Term: To give you a better idea how
Oracle operates, this section analyzes a sample transaction. Throughout this
book, the term transaction is used to describe a logical group of work that can
consist of one or many SQL statements and must end with a commit or a rollback.
Because this example is of a client/server application, SQL*Net is necessary.
The following steps are executed to complete the transaction:
1. The application processes the user
input and creates a connection to the server via SQL*Net.
2. The server picks up the connection
request and creates a server process on behalf of the user.
3. The user executes a SQL statement or
statements. In this example, the user changes the value of a row in a table.
4. The server process checks the shared
pool to see whether there is a shared SQL area that has this identical SQL
statement. If it finds an identical shared SQL area, the server process checks
whether the user has access privileges to the data. If so, the server process
uses the shared SQL area to process the request. If a shared SQL area is not
found, a new shared SQL area is allocated, and the statement is parsed and
executed.
5. The server process finds the data in
the SGA (if it is present there) or reads the data from the datafile into the
SGA.
6. The server process modifies the data
in the SGA. Remember that the server processes can read only from the
datafiles. At some later time, the DBWR process writes the modified blocks to
permanent storage.
7. The user executes either the COMMIT
or ROLLBACK statement. A COMMIT will finalize the transaction, a ROLLBACK will
undo the changes. If the transaction is being committed, the LGWR process
immediately records the transaction in the redo log file.
8. If the transaction is successful, a
completion code is returned across the network to the client process. If a
failure has occurred, an error message is returned.
--------------------------------------------------------------------------------
NOTE: A transaction is not considered
committed until the write to the redo log file is complete. This arrangement
ensures that in the event of a system failure, a committed transaction can be
recovered. If a transaction has been committed, it is set in stone.
--------------------------------------------------------------------------------
While transactions occur, the Oracle
background processes do their jobs, keeping the system running smoothly. While
this process occurs, hundreds of other users might be performing similar tasks.
Oracle's job is to keep the system in a consistent state, to manage contention
and locking, and to perform at the necessary rate.
This overview is intended to give you an
understanding of the complexity and amount of interaction involved in the
Oracle RDBMS. As you look in detail at the tuning of the server processes and
applications later in this book, you can use this overview as a reference to
the basics of how the Oracle RDBMS operates. Because of the differences in
operating systems, minor variances in different environments will be discussed
individually.
No comments:
Post a Comment