Implementing an Automated Standby Database
By Roby Sherman
Automate the maintenance of your standby database, using new features in Oracle8i
Oracle8i contains an enhanced feature that automatically keeps a standby database synchronized with your production database. This new feature, called the Automated
Standby Database (ASD), greatly reduces the amount of manual work database administrators must perform to develop mechanisms for transporting the archived redo logs from the
production environment to the standby database.
Oracle introduced the standby database with Oracle7 as a means to clone a production database for use in the event of a disaster. The idea behind it is that you maintain an
exact replica of your production database: The hardware, database, and other resources of the two environmentsproduction and standbyshould be as close to identical as
possible.
As the production database operates, it creates and archives redo logs. As part of the process of maintaining a standby database, you send a copy of the archived redo logs to
the standby database. There, the system applies the copies of the archived redo logs in a continuous roll-forward fashion, which keeps the two databases in sync.
If your production database experiences a catastrophic failure, you can activate the standby database to serve as your new production environment, thereby providing
near-continuous availability.
Automating the Process with Oracle8i
In releases prior to Oracle8i, the process of transporting and applying archived redo logs to the standby database was not automated unless the DBA created his or her own
automated method. But now, several enhancements in Oracle8i allow you to automatically and efficiently transfer and apply archived redo logs to one or more standby
databases. In addition, you can take the Oracle8i ASD out of recovery mode and use it as a read-only database, allowing users to run queries against it during peak user
hours.
Before you begin to establish an ASD configuration, you must answer many important questions. Although ASD can certainly be used very effectively in some production
environments, there are some scenarios where ASD solutions are not appropriate. For example, if the network connection between the production and standby machines does not provide
the bandwidth necessary for the timely transfer of archive logs, the archive logs may begin to back up or overall network performance can be compromised.
As with any architecture change, performing research and evaluation are the keys to understanding whether or not implementing a standby database is appropriate for your
particular environment.
Setting Up An Automated
Standby Database
The first step in creating a standby environment is to configure the standby server machine to resemble the production machine's configuration as closely as possible. Although
it is not an absolute requirement that the standby's configuration mirror the production environment exactly, the more similar the two configurations are, the better, to maximize
operational smoothness and consistency. This is especially important if you plan to operate your databases by having the production database and the standby database swap roles
each time a failure occurs.
The following steps refer to the servers listed in Table 1. The production instance, dtmprod1, is currently running in ARCHIVELOG mode and resides on server
rs-co1. The standby database will be called dtmback1 and reside on the server rs-co2, which has been configured similarly to the production instance.
Oracle recommends that the file-system and raw-device layouts remain identical on both machines, but I have made them slightly different in order to illustrate the use of the
init.ora parameters DB_FILE_ NAME_CONVERT and LOG_FILE_NAME_ CONVERT in Oracle8i.
Step 1. Evaluate Your Environment. Make sure you have assessed your environment by answering these basic questions before you proceed to enable ASD:
- How often should a log switch occur? If your redo logs are too small and you are getting frequent log switches, you may not have sufficient time to resynchronize your
standby environment manually before new archive logs are generated. For our particular environment, we sized the dtmprod1 production database redo logs at 20MB each, which caused
the database to switch once every 30 minutes, providing enough time to allow the archive log transfer to complete and also allowing time for a manual resynchronization if
necessary. Naturally, these values will be specific to your particular environment.
- Approximately how many transactions are in each redo-log file? If you increase the size of your redo logs to reduce the number of log switches, you will be storing more
transactions in each file. This means that the amount of time required to ship the archived redo log to the remote side will increase, potentially causing a network bottleneck. The
number of transactions per log will also be greater, increasing the amount of potential data loss if the production failure occurs before the system can transmit all archived redo
logs. So, for example, in our case, we didn't know the number of transactions, but we did know that our DML changes were committed almost immediately and that our network could
transfer the archive logs from the production database to the standby database within 3 to 5 minutes. Based on this information, we estimated that in a worst-case scenario, we
would lose 15 to 30 minutes' worth of transactions.
- How will the system forward the application users seamlessly to the standby should the production database fail? When the production database fails, the users need a way
to connect to the standby once it is activated. Depending on your environment, you can create one of many Net8-based solutions (Net8 is the enhanced version of SQL*Net that ships
with Oracle8), using Net8's failover facility, or you can create coding-specific solutions to this problem. The solutions have different advantages and disadvantages, ranging in
complexity from a simple tnsnames.ora modification on the client machines to server IP/DNS reconfigurations. In our situation, to be able to provide near-continuous database
availability, we developed an application-routing API that redirects inbound user requests to the "live" database. We can easily instruct the API to redirect inbound
requests to the standby database, once it has been activated.
- Is there a sufficient number of logfile groups? If you have only a small number of logfile groups, the ARCH process may not have enough time to archive the inactive log
to the standby before the production database requires the file for reuse. If this occurs, the database will hang momentarily until archiving completes, assuming that your
production database is configured to treat the standby as a mandatory destination.
Step 2. Validate Net8 Connectivity. The production database dtmprod1 will communicate with the remote dtmback1 via Net8. This means that you must set up the listener on
the rs-co2 standby system to accept Net8 connections from the database rs-co1. You may also need to place a tnsnames.ora entry for dtmback1 on the rs-co1 production system if you
are not using Oracle Names in your environment. Use SQL*Plus on the production database to verify that you have configured connectivity properly and that the listener is
functioning properly on the remote server.
Step 3. Back up the Production Database. Next, prepare the production database on rs-co1 for transfer to the standby on rs-co2. If user activity might occur during this
process, you should place the database in a restricted mode prior to starting these procedures, to prevent the introduction of changes into the database.
Perform the following actions to archive the current online redo log and to create a standby database control file, using a SYSDBA account such as SYS or SYSTEM with SQL*Plus,
svrmgrl, or another comparable utility:
SVRMGR> ALTER SYSTEM ARCHIVE LOG CURRENT;
SVRMGR> ALTER DATABASE CREATE STANDBY CONTROLFILE AS '/pathname/filename.ctl'
Next, cleanly shut down your database. Then copy the production database's datafiles to equivalent locations on the standby database host. Do not change the names of these files
when moving them between the two platforms.
Make sure you copy
- All tablespace data files
- All redo logs
- All archived redo logs currently online
- The standby control file created in the previous step (be sure to place copies of this file with the appropriate name in the equivalent locations on the standby where the
production control files would have existed, if you had copied them)
However, make sure you do not copy
- The production control files (the standby will use the standby control file you generated previously)
- The production init.ora files (you will copy these later)
Step 4. Modify the init.ora File on the Production Database. Oracle8i includes new and modified parameters for automating the standby-database maintenance process
(see Table 2 for a list of the new init.ora parameters). In our example, the production database, dtmprod1, is configured to archive to a singular file-system
destination, using the log_archive_dest parameter. You must modify it as shown in Listing 1 to automatically ship the archived redo logs to the remote
standby.
Step 5. Copy the Production init.ora File(s) to the Standby Database for Modification. Next, copy the production init.ora file to the appropriate directory on the standby
database server. Once you have copied the file, open it and make these additional changes on the standby server, as shown in Listing 2:
- Update the CONTROL_FILES parameter so that it points to all of the standby control files located on the standby database server.
- Add the STANDBY_ARCHIVE_ DEST parameter and set its value to the location where the standby database should store any archived redo logs received from the production database.
- Make any applicable changes to the LOG_ARCHIVE_DEST_n parameters (or comment them out), to point to appropriate log-archive storage locations to be employed if the standby
database is activated.
Step 6. Start the Standby Database. The standby database is now ready to be put into service. To start it, perform the following actions on it, using a SYSDBA account
such as SYS or SYSTEM with SQL*Plus, svrmgrl, or another comparable utility:
SVRMGR> CONNECT SYS AS SYSDBA
SVRMGR> STARTUP NOMOUNT
SVRMGR> ALTER DATABASE MOUNT STANDBY DATABASE;
SVRMGR> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE;
or
SVRMGR> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE TIMEOUT nn;
(nn = the number of minutes to wait for an archived
redo log before recovery is automatically canceled.)
In Oracle8 Release 8.1.5, the ALTER DATABASE RECOVER MANAGED STANDBY DATABASE command does not return until it is canceled (from another session) or you press CTRL-C. I
recommend that you execute this command on the console for the standby server so that it can remain open indefinitely.
Step 7. Restart the Production Database. Now that the standby database is ready to receive archive logs, you are ready to restart the production database. Perform the
following actions on the production database, using a SYSDBA account such as SYS or SYSTEM with SQL*Plus, svrmgrl, or another comparable utility:
SVRMGR> CONNECT SYS AS SYSDBA
SVRMGR> STARTUP
Step 8. Verify Production to Standby Transfer Operations. Finally, validate that the production database is able to automatically transmit logs to the standby database
and that the standby is ready to receive them. To make sure that the production database is transmitting the logs, execute the following command against the production database:
SQL> ALTER SYSTEM SWITCH LOGFILE;
When sufficient time has passed for the production database to transport the archived redo log, perform this query on the production database:
SQL select * from v$archive_dest;
Find the row in the output that corresponds to your standby database, and verify that the values for BINDING, REOPEN_SECS, and DESTINATION are correct, based on your
requirements. Make sure that the value for STATUS is VALIDif not, check the FAIL_DATE, FAIL_SEQ, and ERROR columns for the error.
Next, verify that the directory specified in standby_ archive_dest init.ora parameter on the standby database contains the appropriate archived redo log that should have been
copied from production. Make a note of the log's sequence number.
Check the last entries of the standby's alert log for information about the log received by the standby database. Verify that the log-sequence number for which the standby was
waiting matches the sequence of the actual archived log in the standby_ archive_dest directory (the entry appears as "Media Recovery Waiting for thread n seq# xyz").
Finally, make sure that that Media Recovery applied the log (the entry appears similar to "Media Recovery Log /pathname/logxyz_ n.arc"). See Listing
3.
With Oracle8 Release 8.1.5, if the above conditions have not been met, you must cancel the standby's automatic recovery and manually transport and apply all missing archived
redo logs.
Once you have applied the logs and restarted the automated standby, the log-sequence number pending on the standby database should match the current log sequence in the
production database.
Activating the Standby Database
You now have an operational, automatically updated standby database that you may activate in the event of a catastrophic production failure. Once activated, the standby database
becomes a production-like instance that cannot be converted back to its previous form. If the standby database is already in ASD mode when a failure occurs, follow these steps to
activate the standby database:
- If possible, archive your current production database logs by issuing ALTER SYSTEM ARCHIVE LOG CURRENT on the production database. Copy the current online redo logalong
with any other remaining archived logs that have not yet been transportedto the standby database server.
- If your standby database has not timed out from your recovery, simply open a new SQL session into the standby database, by using a DBA account, and issue
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
- Locate the end of the standby database's alert.log, and identify the last archived log that was applied.
- Manually apply any remaining logs to the standby database:
SQL> ALTER DATABASE RECOVER [FROM 'pathname'] STANDBY DATABASE;
- When you have applied the remaining logs to the standby database, stop the recovery by issuing
SQL> ALTER DATABASE RECOVER CANCEL;
- Convert the standby database to a production environment:
SQL> ALTER DATABASE ACTIVATE STANDBY DATABASE;
- Shut down the standby database. The standby is now a full-fledged production database. Assuming that the standby database is the only working database at this point, it is
highly advisable that you immediately perform a backup before starting the database for production use in the event of another emergency.
- Start the new production database as you would start any other database.
- Initiate any and all necessary processes that will redirect users smoothly to the new production database.
Now that your database users have safely migrated to the standby database, you can begin the process of recovering your production database.
ASD Maintenance Tips
- Although the REOPEN parameter will attempt to reestablish the connection to a failed standby destination, it will (re)transmit the missed archive logs only if the destination
is specified as MANDATORY. If a destination is OPTIONAL, the REUSE parameter will not attempt to transmit a log that hasn't been sent. You must manually transfer and apply these
logs, plus any logs that transmitted since the REOPEN was successful, before ASD can resume. Check the production database alert.log to see if a log sequence failed to transmit to
the standby database.
- Actions such as altering the physical structure, performing direct loads, and executing unrecoverable operations against your production environment can have serious impacts on
the standby database. Although these actions are too numerous for the scope of this article, it is critical that you understand them. For more information, see the Oracle8
Backup and Recovery Handbook.
- Oracle8i provides the facility to open a standby database in read-only mode and perform the operations of a temporary reporting database. Although this functionality may
be useful for your environment, be aware that additional logs transmitted from production to the standby while the standby is operating as a reporting database will not be applied
until the standby is placed back into ASD recovery mode, which can increase the amount of downtime in the event of a production failover. If you want to take advantage of the ASD
and read-only features, it may be more appropriate to create two standby databases.
|
Implementing an Automated Standby Database
|