| File: | 195_Cluster_Install.txt |
| Last Modified: | 2001/09/07 15:38:04 |
| Author: | mtripets |
| Document Title: | Oracle Installation |
|---|
Contents
| 195.1 | General -- |
| 195.2 | Contacts -- |
| 195.3 | Support -- |
| 195.4 | Licences -- |
| 195.5 | Troubleshooting -- |
| 195.6 | Installation -- |
| 195.7 | OS Install |
| 195.8 | OS Modifications |
| 195.9 | Cluster Software Install |
| 195.10 | Reboot all nodes |
| 195.11 | Configure SCI Devices |
| 195.12 | Install ORCLudlm patch |
| 195.13 | Reboot all nodes |
| 195.14 | Install and configure Veritas packages and licenses on all nodes |
| 195.15 | Reboot all the nodes |
| 195.16 | Start primary cluster node |
| 195.17 | Create oracle disk groups and raw volumes for OPS |
| 195.18 | Restart cluster on all nodes |
| 195.19 | Install Oracle with the OPS option on all nodes |
| 195.20 | The following steps are for installing the Cluster software on savage and selway. |
| 195.21 | Starting and stoping the cluster |
| 195.22 | Cluster Troubleshooting |
| 195.23 | SCI interface troubleshooting |
195.1 General --
NOTE: After install, **NEVER** restart a machine without stoping the cluster node
running on it. The system will go down in an unstable state and require a
good fscking before it is happy again.
NOTE: During the install, do not comment out or otherwise secure any of the usual
suspect services such as rsh as they are required for the OPS install.
Any deviations from the directions provided from sun/oracle are presented here.
195.2 Contacts --
195.3 Support --
195.4 Licences --
Sun Cluster does not expressly require licence information to install. Details about
Veritas licence information can be found in section 195.14 .
195.5 Troubleshooting --
See section starting at 195.21
195.6 Installation --
Overview:
195.7) Install Solaris 8 and patch
195.8) Modify system files and users per instructions below
195.9) Use scinstall to install Sun Cluster 2.2. Specify VxVM/Cluster Volume Manager
and OPS as features.
195.10) Reboot all nodes
195.11) Configure SCI devices
195.12) Install ORCLudlm patch
195.13) Reboot all nodes
195.14) Install Veritas packages and licenses on all nodes.
Run vxinstall on all nodes for boot disk encapsulation
195.15) Reboot all nodes
195.16) Start primary cluster node
195.17) Create oracle disk groups and raw volumes for OPS database
195.18) Restart cluster on all nodes
195.19) Install Oracle with the OPS option on all nodes
195.7 OS Install
Initial software install from 'boot cdrom'
Additional selections as follows (in order of apperance):
Set IP number from DNS
No IPv6
No kerberos
No name resolution (unless at colo)
Select Nort America/U.S.A for local
NO 64 bit support
Entire Dist + OEM Support
For boot disk selection, look over the selection of presented disks.
On each of the non-SCSI controllers there will be a large number of
disks offered - do not select any of these. You should see something
like the following to identify the SCSI disks:
x [ ] c1t26d0 (17269 MB) 17269 MB
x [X] c2t0d0 (17269 MB) boot disk 17269 MB
x [ ] c2t1d0 (17269 MB) 17269 MB
x [ ] c4t0d0 (17269 MB) 17269 MB
There will be one controller with two disks attached. In this case it is the c2 device.
Select the first disk on the controller. If you do not know which one to select, stop
now and find someone who knows.
For disk geometry, create a 2 GB root, and a 2 GB /tmp partition. Leave the
remainder of the disk space unused. The displayed space will show 2001 MB
from a rounding error.
Customize Disk: c2t0d0 -------------------------------------------------------
Boot Device: c2t0d0s0
Entry: Recommended: MB Minimum: MB
================================================================================
Slice Mount Point Size (MB)
0 / 2001
1 /tmp 2001
2 overlap 17269
3 0
4 0
5 0
6 0
7 0
================================================================================
Capacity: 17269 MB
NOTE: You will need to put in the second cdrom after the initial reboot to complete
the OS install.
Allocated: 4002 MB
Rounding Error: 1 MB
Free: 13266 MB
After the remaining questions, the first cd will be processed and the machine will reboot.
When asked, choose a 'root' password. Wait a moment for the install process to continue.
Choose the media (Option 1, CD) from which you will install Solaris 8.
1. CD
2. Network File System
3. Skip
Media [1]: 1
You will be prompted to insert the CD for Solaris 8 (SPARC) Software 2.
After you insert the CD, please press Enter.
Enter the number corresponding to the desired selection for more
information, or enter 2 to continue [2]: 2
End of Solaris 8 Software 2 installation.
<Press ENTER to continue>
<Press Return to reboot the system>
Once the machine has rebooted Install the Jumbo Patch. This can be found on host cluster.
195.8 OS Modifications
After the Installation edit the following files:
a) /etc/hosts
# Internet host table
#
127.0.0.1 localhost
192.168.10.32 volga volga.iengineer.com loghost
192.168.10.22 neva
192.168.10.1 gateway
192.168.10.25 cluster
b) /.profile
PATH=$PATH:/usr/local/bin:/usr/ccs/bin:/opt/SUNWcluster/bin:/opt/SUNWpnm/bin:
/opt/VRTSvmsa/bin:/etc/vx/bin:/opt/SUNWsci/bin:/opt/SUNWscid/bin:/opt/SUNWsma/bin
export PATH
MANPATH=$MANPATH:/opt/SUNWcluster/man:/opt/VRTSvmman/man:/opt/SUNWsma/man
export MANPATH
stty erase ^H
TERM=vt100; export TERM
c) /etc/hosts.equiv
neva
volga
d) /etc/defaultrouter
gateway
c) /.rhosts
# node 0
204.152.65.33
204.152.65.1
204.152.65.17
# node 1
204.152.65.34
204.152.65.2
204.152.65.18
192.168.10.22
192.168.10.32
d) /etc/system
PLEASE COPY /etc/system to /etc/system.bak before editing the
file. If there is a problem booting, use the "boot -a" option
at the ok prompt and follow the directions.
* Begin Oracle info
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
set semsys:seminfo_semmni=100
set semsys:seminfo_semmsl=300
set semsys:seminfo_semmns=600
set semsys:seminfo_semopm=100
set semsys:seminfo_semvmx=32767
* End Oracle info *
e) /etc/group
Add the following groups to /etc/group
oinstall::830:
dba::860:oracle
f) Add the oracle user, and place it into the oinstall and dba groups.
The oinstall 830 must be the primary and dba 860 the secondary.
195.9 Cluster Software Install
To install the cluster software, use the Sun Cluster System Cluster Control Panel GUI
located in /opt/SUNWcluster/bin. This tool will allow you to connect to both machines
in parallel.
Before you do this, the cdrom on the primary must be shared with the secondary
in such a way that /cdrom/direct will produce the same output. The following has proven
to be a known process:
1. Edit /etc/dfs/dfstab file
share /cdrom/multi_suncluster_sc_2_2
2. Restart the NFS Server (/etc/rc3.d/S15nfs.server)
3. Type shareall after restarting the nfs.server
4. Create a direstory on secondary node mimicking the primary cdrom directory
/cdrom/multi_suncluster_sc_2_2
5. Mount on secondary node
mount neva:/cdrom/multi_suncluster_sc_2_2 /cdrom/multi_suncluster_sc_2_2
Now that the cdrom is shared, start the CCP tool as follows:
./ccp prodwc-db-cluster
NOTE: the prodwc-db-cluster has been pre-defined on the cluster console and will allow
you to connect to the console servers. Do not enter any other value here. There may be
a short delay while the CCP connects.
From now on, the directions located at:
http://docs.sun.com:80/ab2/coll.650.1/CLUSTINSTALL/@Ab2PageView/3864?Ab2Lang=C&Ab2Enc=iso-8859-1
will be used, starting at the *Server Software* install section located aprox 25% along
in the doc. The install directions will be considered verbatim except where noted. Numbers
in the following notes reference the original document.
1)..... Use ./ccp prodwc-db-cluster
2)..... Set TERM=vt100; export TERM here
3-8)... *SKIP THESE* as they have already been done in the 195.6.2 install
9)..... no change
10).... no change
11).... Be sure to use automatic
12).... Select CVM
13).... Cluster name is prodwc-db-cluster
14).... Potential nodes = 2, initially configured nodes = 2
15).... SCI interface; hostname node 0 = neva, hostname node 1 = volga
No ethernet information required
16).... HA data services = yes, logical hosts = no
17-27). SKIP These questions will not be offered
28).... Select the quorum device as the first disk on the shared arrays. In this
case c0t0d0s2 should be the logical choice. This is most important so if
there are questions, find someone who can answer you before continuing
29).... unused
30).... Select OPS (#10) for data services, then 12 to quit. The documented list has
one more entry than the software will offer
31).... quit install
32-33). SKIP
195.10 Reboot all nodes
195.11 Configure SCI Devices
Exhaustive documentation on the SCI cards can be found at:
http://docs.sun.com:80/ab2/@LegacyPageView?toc=SUNWab_83_4:/safedir/space3/coll2/SUNWabha/toc/ETPCLINSTALL:Page_B-323;bt=Enterprise+Cluster+Planning+and+Installation+Manual;ps=ps/SUNWab_83_4/ETPCLINSTALL/B.Configuring_the_SCI_SBus_Card#3
I kid you not about this url, but it is all we have ...
The key to the install (given that the hardware is ok) is the file which defines the
geometry of the cluster. In this case we have no switches, two links/cables, two
hosts, and four adapters. The file link1.sc is presented for this purpose, and can
be found on host cluster in /home/cluster/files/link1.sc. It has been edited to fit
the cluster as defined up until now.
To run the configuration, select the console on neva:
1) copy link1.sc to /opt/SUNWsma/bin
2) run
./sm_config -f link1.sc
This command must be ONLY executed on the primary node.
3) Assuming no pathological errors, there is no need to reboot.
195.12 Install ORCLudlm patch
This patch must be applied on both nodes before veritas or oracle software are added.
1) Copy the patch from host cluster located in /home/oracle/ORCLudlm.tar.Z
to /tmp and unpack
2) Install via pkgadd
cd /tmp; pkgadd -d . ORCLudlm
195.13 Reboot all nodes
195.14 Install and configure Veritas packages and licenses on all nodes
Installing the Veritas software packages can most easily be done using the same
CCP gui that was used in the initial cluster install. You will have to change
the mounting point on volga for both servers to see the same path.
Be sure to use the "Veritas Foundation Products 2000-08 for Solaris" copy of
the software.
1) Mount the cdrom on both servers
2) cd to /cdrom/foundation_products_2000_08_sun/Solaris_8/pkgs and run
pkgadd -d .
3) Select all software options. During the install, select default paths and say
yes for the client/server install. PDF documentation should be most portable.
Software licenses are required to initialize the cluster functionality required for
shared disks. These licenses are located on host cluster in /home/cluster/veritas_lic
and are named based on the host they are assigned to. To install on to the requisite host,
just cut and paste the license information into the appropriate terminal.
To make sure which license file is for which host, run hostid on the machines.
In each license file there is a reference to that hostid, can't miss it.
Example:
1) On neva, run "vxlicense -c" and press return
2) Using neva1.htm (cat neva1.htm) cut and paste the key
1860 9194 5635 4989 0189 5453 1966 0
3) Repeat this one more time for this key, and one more time for the neva2.key
4) Repeat for volga
There is no need to reboot at this point
To install the packages, follow these directions:
1) Type vxinstall
2) The installer will provide information about the disk controllers that it sees. You
can expect to see three (two fiber channel, and one SCSI). Hit return to continue
3) Select *Custom Installation*. This is very important...
4) You will be presented with the disks attached to each controller and a menu for what you
want to do with them. What you do will be based on the controller and attached drives
offered to you.
If the install program offers to "Encapsulate Boot Disk" follow the directions in (5)
If a large number of disks are offered up from controllers c0 or c4, follow directions in (6)
5) Encapsulate Boot Disk:
Select 'y' as this will make the boot device a veritas volume.
Encapsulate Boot Disk [y,n,q,?] (default: n) y
Do not take the default name offered. Instead select from the following:
Enter disk name for [<name>,q,?] (default: rootdisk) rne001 (neva)
Enter disk name for [<name>,q,?] (default: rootdisk) rvo001 (volga)
The second disk on the boot controller should be offered up in the following menu:
Installation options for controller c2
Menu: VolumeManager/Install/Custom/c2
1 Install all disks as pre-existing disks. (encapsulate)
2 Install all disks as new disks. (discards data on disks!)
3 Install one disk at a time.
4 Leave these disks alone.
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: (2)
Select two, using the name rne002 or rvo002 depending on the host.
6) Skipping the other disks:
When presented with the option from the other two controllers:
Installation options for controller c0
Menu: VolumeManager/Install/Custom/c0
1 Install all disks as pre-existing disks. (encapsulate)
2 Install all disks as new disks. (discards data on disks!)
3 Install one disk at a time.
4 Leave these disks alone.
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: (4)
Select *four* for both controllers. This is most important since cluster
will not start with the quorum device encapsulated.
7) You should see something like:
The following is a summary of your choices.
c2t0d0 Encapsulate
c2t1d0 New Disk
Is this correct [y,n,q,?] (default: y)
when the install is complete. If there are a large number of disks slated for encapsulation
be sure to select 'n' and go back to correct the mistake.
8) Select reboot from the offered menu
195.15 Reboot all the nodes
During the boot process, you should see the root partition encapsulation taking place. After this
happens, the machine(s) will reboot by automaticaly.
195.16 Start primary cluster node
The (first) moment of truth!
When the machine has booted cleanly, log in as root and enter the following command on the
*primary* node (ie neva):
#scadmin startcluster neva prodwc-db-cluster
You should see several dozen lines on the console at this point as the cluster reconfigures itself
to the new configuration. As long as the last lines look similar to:
prodwc-db-cluster node 0 (neva) is a cluster member
prodwc-db-cluster node 1 (volga) is not a cluster member
prodwc-db-cluster cluster reconf #1 finished
Note: there will be date/time/syslog data as headers to the above information. Since the
data is presented as type local0.error, do not be concerned with the word error in this context.
If this is not what you are seeing, or "quorum exited with 1 in startnode" is the last message,
see 195.20 for troubleshooting information.
195.17 Create oracle disk groups and raw volumes for OPS
195.18 Restart cluster on all nodes
See 195.20 for details
195.19 Install Oracle with the OPS option on all nodes
The media is located in the blue Software Cabinet in
Brent's office.
We are using Oracle 8.1.6
*********************************************
* VERY IMPORTANT INFORMATION !!!!!!!!!*
*********************************************
In order for the Oracle Installer to show the Oracle Parallel Option
during the installation the cluster must be running on both machines.
Also RSH must be enabled on both machines in order to install OPS.
*********************************************
*********************************************
The /etc/system file must be backed up and have these values inserted
* Begin Oracle info *
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
set semsys:seminfo_semmni=100
set semsys:seminfo_semmsl=300
set semsys:seminfo_semmns=600
set semsys:seminfo_semopm=100
set semsys:seminfo_semvmx=32767
* End Oracle info *
Once the media is inserted into the machine, the SunOS automatically mounts
the cdrom.
To the begin the installation
Export your DISPLAY variable. If using a Windows machine you must have
an X Windows Server running locally.
cd to /cdrom/oracle8i and type
./runInstaller
Choose the location where you want the software installed.
Choose Oracle8i Enterprise Edition 8.1.6.0.0
Select Unix group name: oinstall
As root user execute in /tmp/OraInstall/orainstRoot.sh
Continue installation.
From Avaliable Products choose Oracle8i Enterprise Edition 8.1.6.0.0
For Installation Types choose Custom.
From Available Product Components uncheck the following:
1. Oracle Time Series
2. Oracle Visual Information Retreival.
3. Oracle Advanced Security.
4. Oracle InterMedia.
5. Oracle Names
6. Oracle Connection Manager.
7. Oracle External Naming.
8. All the Enterprise Manager Products
Chose the following:
1. All the JDBC Drivers.
For Component Locations select Java Runtime Environment and press ENTER.
On Cluster Node Selection Screen hit ENTER.
For Priviliged Operating System Groups choose oinstall for both.
For question to Create Database say NO !!!
For Summary just say INSTALL and go and have 10 coffees.
When asked, run as root /share/oracle/product/8.1.6/root.sh
Directory to use for client and server information [/files/nsr]? ENTER
Enter the first NetWorker server's name [no more]: ENTER
Start NetWorker daemons at end of install [yes]? no
Do you want to continue with the installation of <ORCLclnt> [y,n,?] n
Do you want to continue with installation [y,n,?] n
Enter the full pathname of the local bin directory: [/usr/local/bin]: ENTER
*********************************
* Raw Volume Creation *
*********************************
To run OPS the database files must reside on raw volumes.
To create the volumes there is a script that generates all
the necessary commands and a config file. This script and file
are located in:
S:\ops_dev\opscore\main\cluster
Or locally on the machine in:
/share/oracle/vm
The main script called vmcmdgen takes as a parameter a config file
called vm.neva.cfg and an output command file which list all the
available disks, specifies the volume names, which disks to use for
which volume, mirroring and striping information...etc.etc.etc.
To use, type the following:
vmcmdgen vm.neva.cfg vm.neva.cmd
And run the vm.neva.cmd from the command line.
****************************
* DATABASE CREATION *
****************************
Once the software is installed, log in as user ORACLE and run the following scripts
located in S:\ops_dev\opscore\main\cluster or on the local machine in
/share/oracle/admin/pcmprod/create. These scripts will create an
OPS database. Once these scrips have executed successfully. All you have to do is
start up the secondary instance on the secondary node.
From the command prompt type SVRMGRL. once you get the "SVRMGR>" prompt type
connect internal. There is no password.
And run in the following order.
1: @/share/oracle/admin/pcmprod/create/crdbpcmprod.sql
2: @/share/oracle/product/8.1.6/rdbms/admin/catalog.sql
3: @/share/oracle/admin/pcmprod/create/crdb2pcmprod.sql
4: @/share/oracle/product/8.1.6/rdbms/admin/catproc.sql
5: @/share/oracle/product/8.1.6/rdbms/admin/caths.sql
6: @/share/oracle/product/8.1.6/rdbms/admin/otrcsvr.sql
7: @/share/oracle/product/8.1.6/rdbms/admin/utlsampl.sql
then log in to SVRMGRL as user SYSTEM and run
8: @/share/oracle/product/8.1.6/sqlplus/admin/pupbld.sql
9: @/share/oracle/admin/pcmprod/create/crredologs.sql
10: @/share/oracle/admin/pcmprod/create/crrollbacksegs.sql
11: @/share/oracle/product/8.1.6/rdbms/admin/catparr.sql
The scripts in steps 1, 3, 9 and 10 are already modified to create
a OPS database caled PCMPROD.
Naturally the *.ORA scripts are instance specific and
should be modified accordingly. Mostly the difference
is the instance name.
ex: PCMPROD1 and PCMPROD2.
Once all the create and admin scripts have run successfully,
edit the following files:
on primary host.....
INITPCMPROD.ORA
INITPCMPROD1.ORA
and on secondary host.....
INITPCMPROD.ORA
INITPCMPROD2.ORA
These files are located in /share/oracle/admin/pcmprod/pfile
with symbolic links in /share/oracle/product/8.1.6/dbs with the same names
pointing back.
The working examples are located in /share/ops_dev/opscore/main/cluster.
To create symbolic links use the following command from this directory:
/share/oracle/product/8.1.6/dbs
ln -s /share/oracle/admin/pcmprod/pfile/initpcmprod.ora initpcmprod.ora
ln -s /share/oracle/admin/pcmprod/pfile/initpcmprod1.ora initpcmprod1.ora
Once both instances of OPS on both nodes are created.
Make sure that the following parameter is in both INITPCMPROD.ORA
files on both machines:
remote_login_passwordfile = exclusive
Make sure that the parameter rollback_segments contains only the
private rollback segments or else the instances will not come up.
Create the ORAPW<db_name> file, use the following command from the
/share/oracle/product/8.1.6/dbs directory.
orapwd file=orapw<db_name> password=<sys user password>
Once these steps are done, you are ready to startup the LISTENER
on both instances.
Start the LISTENER by typing:
lsnrctl start
on both nodes.
On the primary node login as user ORACLE. Logon in to SVRMGR.
Connect internal. Type startup. Once the instance is up.
Do the same on the secondary node.
********************************************************************
The following steps are for running the database in ARCHIVE LOG MODE
********************************************************************
In order to start up the instances in ARCHIVE LOG MODE.
Make sure that the following parameters are set in the INIT.ORA files on both nodes.
log_archive_start = true
log_archive_dest_1 = "location=/share/oracle/admin/pcmprod/arch"
log_archive_format = arch_%t_%s.arc
Also making sure that the PARALLEL_SERVER parameter is set to FALSE.
Startup the primary instance in exclusive mode by issuing the following command.
SVRMGR> startup mount
ORACLE instance started.
Total System Global Area 210692528 bytes
Fixed Size 51632 bytes
Variable Size 143351808 bytes
Database Buffers 67108864 bytes
Redo Buffers 180224 bytes
Database mounted.
Followed by:
SVRMGR> alter database archivelog;
Statement processed.
SVRMGR> alter database open;
Statement processed.
To verify that the DB is actually in ARCHIVE LOG MODE type the following command.
SVRMGR> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /share/oracle/admin/edmprod1/arch
Oldest online log sequence 5
Next log sequence to archive 7
Current log sequence 7
In order to configure Net8 (previously know as SQLNet) edit the
LISTENER.ORA and TNSNAMES.ORA to contain proper entries for the
hosts and names of databases and instances.
These files are located in:
/share/oracle/product/8.1.6/network/admin
The working examples
are located in /share/ops_dev/opscore/main/cluster.
195.20 The following steps are for installing the Cluster software on savage and selway.
Also for creating a OPS database on sabage and selway.
************************
* CLUSTER INSTALLATION *
************************
1. Install the client on thames by running scinstall.
./scinstall
2. On Software Selection Menu choose option 3. Client.
3. Choose automatic install mode.
4. Choose option 5 to quit client installation.
5. Start the ccp "cluster control panel" on thames located in /opt/SUNWcluster/bin.
./ccp dev-db-cluster, dev-db-cluster is the name of the cluster.
6. cd to /cdrom/multi_suncluster_sc_2_2/Sun_Cluster_2_2/Sol_2.8/Tools and run scinstall.
7. On Main Menu choose install option 1. Install/Upgrade Server Packages.
8. On Software Selection Menu choose option 2 Server.
9. Specify the path to the cdrom. Remember that the cdrom is shared so on one machine you
will have to manually type out the path.
10. For Volume Manager Selection choose Cluster Volume Manager, option 1.
11. Hit ENTER when asked if CVM must be installed before OPS can be started.
12. The name of the cluster is dev-db-cluster.
13. Potential nodes = 2.
14. Initially configured nodes = 2.
15. Choose Ether not SCI.
16. Hostname of node 0 is savage.
17. What is savage's first private network interface [hme0] qfe0
18. What is savage's second private network interface [hme1] qfe1
19. What is savage's ethernet address ? get answer from ifconfig -a
20. Hostname of node 1 is selway.
21. What is selway's first private network interface [hme0] qfe0
22. What is selway's second private network interface [hme1] qfe1
23. What is selway's ethernet address ? get answer from ifconfig -a
24. HA data services = yes.
25. Logical hosts = no.
26. Place the quorum device on disk c2t52d0. Different disk #'s on both machines.
27. Specify the path to the cdrom again.
28. For Data Service choose #10, Sun Cluster for Oracle Parallel Server.
29. Choose automatic install mode.
30. SANITY CHECK !!! Choose #3 to verify the installation.
31. Choose option #5 to quit the installation.
32. Reboot all nodes.
33. Create group oinstall and dba.
34. Install package ORCLudlm located in /home/oracle/cluster
35. Choose yes when asked if you want to continue with the installation.
36. Reboot all nodes.
*************************
* OPS Database Creation *
*************************
1. The OPS Unix servers are savage and selway, savage being the primary node.
2. Log on to savage as user oracle (password in the usual place).
3. Source the following file .oraenv from the command line like this:
{oracle@savage:155} . oraenv
ORACLE_SID = [oracle] ? opsdev
opsdev being the database sid (system identifier. ie, db name.)
4. cd to /share/oracle/admin/opsdev/pfile and comment out the following line in initopsdev.ora
#remote_login_passwordfile = exclusive
4. cd to /share/oracle/admin/opsdev/create
5. Log on to SVRMGRL and type "connect internal" like this:
{oracle@savage:156} svrmgrl
Oracle Server Manager Release 3.1.6.0.0 - Production
Copyright (c) 1997, 1999, Oracle Corporation. All Rights Reserved.
Oracle8i Enterprise Edition Release 8.1.6.0.0 - Production
With the Partitioning and Parallel Server options
JServer Release 8.1.6.0.0 - Production
SVRMGR> connect internal
Connected.
SVRMGR>
6. Then execute the db creation scripts in the following order and as the following user.
Note that the scripts are located in different directories.
ORDER IS CRUICIAL FOR SUCCESS!!!
SVRMGR>connect internal
SVRMGR>@opsdev1run.sh
SVRMGR>@/share/oracle/product/8.1.6/rdbms/admin/catproc.sql
SVRMGR>@opsdev1run1.sh
SVRMGR>@/share/oracle/product/8.1.6/rdbms/admin/catproc.sql
SVRMGR>@/share/oracle/product/8.1.6/rdbms/admin/caths.sql
SVRMGR>@/share/oracle/product/8.1.6/rdbms/admin/otrcsvr.sql
SVRMGR>@/share/oracle/product/8.1.6/rdbms/admin/utlsampl.sql
SVRMGR>connect system/manager
SVRMGR>@/share/oracle/product/8.1.6/sqlplus/admin/pupbld.sql
SVRMGR>@opsdev1psrbs.sh
SVRMGR>@opsdev1pslog.sh
SVRMGR>connect internal
SVRMGR>@/share/oracle/product/8.1.6/rdbms/admin/catparr.sql
7. Once all scripts have executed successfully. Shutdown the database on savage:
SVRMGR>conect internal
SVRMGR>shutdown immediate
8. On BOTH MACHINES cd to $ORACLE_HOME/dbs and type the following from the command line:
orapwd file=orapwopsdev passwd=change_on_install
This creates the Oracle password file.
then comment in the following line in /share/oracle/admin/opsdev/pfile/initopsdev.ora
remote_login_passwordfile = exclusive
9. First on savage login to SVRMGR and type startup. Once you see the following:
SVRMGR> startup
ORACLE instance started.
Total System Global Area 215617520 bytes
Fixed Size 69616 bytes
Variable Size 211173376 bytes
Database Buffers 4194304 bytes
Redo Buffers 180224 bytes
Database mounted.
Database opened.
SVRMGR>
Repeat the same startup steps on selway.
10. Test OPS by typing in SVRMGR the following command which creates a test table.
SVRMGR>create table test (id number(1));
Statement processed.
Then on the other machine type:
SVRMGR> desc test
Column Name Null? Type
------------------------------ -------- ----
ID NUMBER(1)
If that is what you see then you're in business.
195.21 Starting and stoping the cluster
The normal startup sequence for the cluster nodes will be primary (neva), then secondary (volga).
Shutting down will be done in reverse of this.
To start:
On neva:
# scadmin startcluster neva prodwc-db-cluster
On volga:
# scadmin startnode
To stop:
On volga:
# scadmin stopnode
On neva:
# scadmin stopnode
195.22 Cluster Troubleshooting
Troubleshooting the cluster can be complex. To begin with we show the expected startup
and shutdown data from the perspective of the local console. Note that some messages
will appear on the master console when a secondary node joins or abandons a cluster group.
Complete examples of logs can be found on cluster in /home/cluster/startuplogs .
Primary cluster node startup messages:
# scadmin startcluster neva prodwc-db-cluster
Node specified is neva
Cluster specified is prodwc-db-cluster
=========================== WARNING =================================
= Creating a new cluster =
=====================================================================
You are attempting to start up the cluster node neva as the
only node in a new cluster. It is important that no other cluster
nodes be active at this time. If this node hears from other cluster
nodes, this node will abort. Other nodes may join only after this
command has completed successfully. Data corruption may occur if
more than one cluster is active.
Do you want to continue? [yes or no] yes
Feb 28 14:00:35 neva ID[SUNWcluster.reconf.1150]: Starting Sun Cluster: node 0 (neva) joining the prodwc-db-cluster cluster
Starting Sun Cluster software - joining the prodwc-db-cluster cluster
Feb 28 14:00:46 neva ID[SUNWcluster.reconf.1200]: Reconfiguration step start started
Feb 28 14:00:46 neva ID[SUNWcluster.sma.smad.1102]: smad: Cluster 'prodwc-db-cluster' monitoring
Feb 28 14:00:46 neva ID[SUNWcluster.reconf.udlm.1000]: prodwc-db-cluster starting the Unix DLM.
Feb 28 14:00:47 neva ID[vxclust]: starting start time: 02/28 14:00:47.442: seq # 0
Feb 28 14:00:48 neva ID[vxclust]: max nodes defined in the cluster=2
Feb 28 14:00:48 neva ID[vxclust]: ending step start time: 02/28 14:00:48.014:
Feb 28 14:00:48 neva ID[SUNWcluster.reconf.1201]: Reconfiguration step start completed
Feb 28 14:00:48 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 1 started
Feb 28 14:00:48 neva ID[SUNWcluster.reconf.1120]: prodwc-db-cluster reconfiguration 1 started on neva
Feb 28 14:00:48 neva ID[SUNWcluster.sma.smad.1103]: smad: Cluster 'prodwc-db-cluster' running
Feb 28 14:00:48 neva ID[SUNWcluster.sma.monitor.6011]: net 0 is up
Feb 28 14:00:48 neva ID[SUNWcluster.sma.monitor.6011]: net 1 is up
Feb 28 14:00:48 neva ID[SUNWcluster.sma.smad.1030]: prodwc-db-cluster net 0 (scid0:1) selected
Feb 28 14:00:48 neva ID[SUNWcluster.udlm.1000]: Unix DLM version (2) and SUN unix dlm library version (1): compatible.
Feb 28 14:00:49 neva ID[SUNWcluster.reconf.quorumdev.1000]: prodwc-db-cluster reserving 0041L04LXM as quorum device
Feb 28 14:00:55 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 1 completed
Feb 28 14:00:55 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 2 started
Feb 28 14:00:56 neva ID[vxclust]: starting step1 time: 02/28 14:00:56.189: seq # 1
Feb 28 14:00:56 neva ID[vxclust]: members 1 joiners 1 leavers 0
Feb 28 14:00:56 neva ID[vxclust]: ending step step1 time: 02/28 14:00:56.191:
Feb 28 14:00:56 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 2 completed
Feb 28 14:00:56 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 3 started
Feb 28 14:00:56 neva ID[vxclust]: starting step2 time: 02/28 14:00:56.598: seq # 1
Feb 28 14:00:56 neva ID[vxclust]: CVM:MASTER=0 SELF=0
Feb 28 14:00:56 neva ID[vxclust]: ending step step2 time: 02/28 14:00:56.600:
Feb 28 14:00:56 neva ID[SUNWcluster.reconf.ccd.1703]: prodwc-db-cluster starting ccdd.
Feb 28 14:00:56 neva ID[SUNWcluster.reconf.ccd.1705]: prodwc-db-cluster starting ccdd completed.
Feb 28 14:01:00 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 3 completed
Feb 28 14:01:00 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 4 started
Feb 28 14:01:00 neva ID[SUNWcluster.reconf.ccd.1706]: CCD Step 4 transition CCD_up=0
Feb 28 14:01:01 neva ID[SUNWcluster.reconf.ccd.1707]: CCD Step 4 transition completed.
Feb 28 14:01:02 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 4 completed
Feb 28 14:01:02 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 5 started
Feb 28 14:01:02 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 5 completed
Feb 28 14:01:02 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 6 started
Feb 28 14:01:02 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 6 completed
Feb 28 14:01:03 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 7 started
Feb 28 14:01:03 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 7 completed
Feb 28 14:01:03 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 8 started
Feb 28 14:01:03 neva ID[vxclust]: starting step3 time: 02/28 14:01:03.901: seq # 1
vxvm:vxconfigd: WARNING: master_send_diskid : diskid send : 982869147.1608.neva
vxvm:vxconfigd: WARNING: master_send_diskid : diskid send : 982868779.1596.neva
vxvm:vxconfigd: WARNING: master_send_diskid : diskid send : 982868227.1589.neva
vxvm:vxconfigd: WARNING: master_send_diskid : diskid send : 982870851.1612.neva
dg dbbdg usetype fsgen: start edmprodctrl01.ctl edmprodctrl02.ctl edmprodctrl03.ctl edmproddat
a01.dbf edmprodindex01.dbf edmprodrbs01.dbf edmprodredo01_1.log edmprodredo01_2.log edmprodred
o01_3.log edmprodredo02_1.log edmprodredo02_2.log edmprodredo02_3.log edmprodsystem01.dbf edmp
rodtemp01.dbf edmprodtools01.dbf
Feb 28 14:01:39 neva ID[vxclust]: ending step step3 time: 02/28 14:01:39.657:
Feb 28 14:01:39 neva ID[SUNWcluster.pnm.pnmd.2005]: pnmd daemon is shutting down
VERITAS VM Storage Administrator Server terminated.
Stopping VERITAS VM Storage Administrator Server
Feb 28 14:01:42 neva ID[SUNWcluster.pnm.pnmd.2006]: Values for PNM tuneable parameters - inactive_time (5 s), ping_timeout (4 s), rep_test (3 time(s)), slow_network (2 s)
Feb 28 14:01:42 neva ID[SUNWcluster.pnm.pnmd.3001]: cannot open config file
Feb 28 14:01:43 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 8 completed
Feb 28 14:01:43 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 9 started
Feb 28 14:01:43 neva ID[vxclust]: starting step4 time: 02/28 14:01:43.595: seq # 1
Feb 28 14:01:43 neva ID[vxclust]: ending step step4 time: 02/28 14:01:43.605:
Feb 28 14:01:43 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 9 completed
Feb 28 14:01:43 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 10 started
Feb 28 14:01:50 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 10 completed
Feb 28 14:01:50 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 11 started
Feb 28 14:01:50 neva ID[SUNWcluster.cvm.1025]: cluster volume manager shared access mode enabled
Feb 28 14:01:50 neva ID[SUNWcluster.cvm.1010]: - node neva vm_on_node is master
Feb 28 14:01:51 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 11 completed
Feb 28 14:01:51 neva ID[SUNWcluster.reconf.1200]: Reconfiguration Step 12 started
Feb 28 14:01:52 neva ID[SUNWcluster.reconf.1201]: Reconfiguration Step 12 completed
Feb 28 14:01:52 neva ID[SUNWcluster.clustd.1920]: prodwc-db-cluster node 0 (neva) is a cluster member
Feb 28 14:01:52 neva ID[SUNWcluster.clustd.1930]: prodwc-db-cluster node 1 (volga) is not a cluster member
Feb 28 14:01:52 neva ID[SUNWcluster.clustd.1940]: prodwc-db-cluster cluster reconf #1 finished
Feb 28 14:01:52 neva netfmd[14589]: Starting up
Primary cluster node shutdown message:
# scadmin stopnode
Assuming a default cluster name of prodwc-db-cluster
Stopping the Sun Cluster software - leaving the prodwc-db-cluster cluster
Feb 28 13:59:39 neva ID[SUNWcluster.pnm.pnmd.2005]: pnmd daemon is shutting down
Feb 28 13:59:41 neva ID[SUNWcluster.pnm.pnmd.3001]: cannot open config file
Feb 28 13:59:42 neva ID[vxclust]: starting stop time: 02/28 13:59:42.004: seq # 3
Feb 28 13:59:42 neva ID[vxclust]: stop: successfully finished
Feb 28 13:59:42 neva ID[SUNWcluster.reconf.1200]: Reconfiguration step return started
Feb 28 13:59:42 neva ID[vxclust]: starting return time: 02/28 13:59:42.873: seq # 3
Feb 28 13:59:42 neva ID[vxclust]: ending step return time: 02/28 13:59:42.874:
Feb 28 13:59:42 neva ID[SUNWcluster.reconf.ccd.1708]: prodwc-db-cluster returning ccdd.
Feb 28 13:59:43 neva ID[SUNWcluster.reconf.ccd.1709]: prodwc-db-cluster returning ccdd completed.
Feb 28 13:59:43 neva ID[SUNWcluster.sma.smad.1104]: smad: Cluster 'prodwc-db-cluster' returning
Feb 28 13:59:43 neva ID[SUNWcluster.reconf.1200]: Reconfiguration step return completed
Feb 28 13:59:43 neva ID[SUNWcluster.reconf.1200]: Reconfiguration step abort started
Feb 28 13:59:44 neva ID[SUNWcluster.pnm.pnmd.2005]: pnmd daemon is shutting down
Feb 28 13:59:46 neva ID[SUNWcluster.pnm.pnmd.3001]: cannot open config file
Feb 28 13:59:46 neva ID[vxclust]: starting abort time: 02/28 13:59:46.657: seq # 3
Feb 28 13:59:46 neva ID[vxclust]: calling volcvm_abort time: 02/28 13:59:46.658:
Feb 28 13:59:46 neva ID[vxclust]: volcvm_abort finished time: 02/28 13:59:46.659:
Feb 28 13:59:46 neva ID[vxclust]: ending step abort time: 02/28 13:59:46.659:
Feb 28 13:59:46 neva ID[SUNWcluster.reconf.ccd.1701]: prodwc-db-cluster aborting ccdd.
Feb 28 13:59:47 neva ID[SUNWcluster.reconf.ccd.1702]: prodwc-db-cluster aborting ccdd completed.
Feb 28 13:59:47 neva ID[SUNWcluster.sma.smad.1105]: smad: Cluster 'prodwc-db-cluster' no longer running
Feb 28 13:59:47 neva ID[SUNWcluster.sma.smad.5010]: prodwc-db-cluster net 0 (scid0:1) de-selected
Feb 28 13:59:47 neva ID[SUNWcluster.reconf.1201]: Reconfiguration step abort completed
The prodwc-db-cluster cluster has no active hosts.
Feb 28 13:59:48 neva ID[SUNWcluster.clustd.transition.4011]: cluster stopped on this node (neva)
195.23 SCI interface troubleshooting
The SCI interface(s) are used as a inter-host communication device for cluster communications.
The SCI host device uses the concept of rings in which two or more nodes (SCI SBus cards) are
mutually connected. In a given ring, one host device is designated as a scrbber. Assignment
as a scrubber is accomplished by setting the onboard scrubber jumper.
Potential problems:
1) Bad SCI board
A hardware error on a SCI interface will most likely betray itself in the
boot sequence. The expected messages on the console should look somthing
like:
SCI Driver : version Dolphin IRM 1.9.8.1 ( 1999-10-22 ) initializing
SCI Driver : 32 bit mode Compiled Feb 29 2000 : 15:42:05
SCI Instance 0 : Awaiting configuration (Serial number is 17709 (0x452d))
SCI Instance 0 : adapter attached
SCI Instance 1 : Awaiting configuration (Serial number is 17609 (0x44c9))
SCI Instance 1 : adapter attached
.
.
SCI Instance 0 : Initializing as adapter number 0 (Serial number is 17709 (0x452d))
SCI Adapter 0 : NodeId is 8 ( 0x8 ) Serial no : 17709 (0x452d)
SCI Adapter 0 : The SCI link is not operational.
SCI Instance 1 : Initializing as adapter number 1 (Serial number is 17609 (0x44c9))
SCI Adapter 1 : NodeId is 12 ( 0xc ) Serial no : 17609 (0x44c9)
SCI Adapter 1 : The SCI link is not operational
SCI Adapter 0 : No switch found
SCI Adapter 0 : The SCI link is operational.
ID[SUNWcluster.sma.smak.1001]: SCI Adapter 0: Card operational
ID[SUNWcluster.sma.smak.1051]: SCI Adapter 0: Link operational
ID[SUNWcluster.sma.smak.1052]: SCI Adapter 0: Switch Serial Number: Not Available
SCI Adapter 1 : No switch found
SCI Adapter 1 : The SCI link is operational.
ID[SUNWcluster.sma.smak.1001]: SCI Adapter 1: Card operational
ID[SUNWcluster.sma.smak.1051]: SCI Adapter 1: Link operational
ID[SUNWcluster.sma.smak.1052]: SCI Adapter 1: Switch Serial Number: Not Available
ID[SUNWcluster.sma.smak.3001]: SCI Adapter 0 (0008): Session to 0048 active
ID[SUNWcluster.sma.smak.3001]: SCI Adapter 1 (000c): Session to 004c active
Here the machine booted, found the devices and attached to the other host without
problem. Note both adapters are listed as operational.
Other indiactors during run time should be found in /var/adm/messages
2) Bad interconnect
Testing a bad interconnect should be like any other networking problem.
The methodology here can be used on all the remaining problems.
Since the interconnect is treated like a network interface by the operating
system, many of the usual tools can be used for locating broken connections.
neva:/> ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.10.22 netmask ffffff00 broadcast 192.168.10.255
scid0: flags=10080c1<UP,RUNNING,NOARP,PRIVATE,IPv4> mtu 16321 index 3
inet 204.152.65.1 netmask fffffff0
scid1: flags=10080c1<UP,RUNNING,NOARP,PRIVATE,IPv4> mtu 16321 index 4
inet 204.152.65.17 netmask fffffff0
neva:/> netstat -r
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ------ ---------
204.152.65.16 204.152.65.17 U 1 0 scid1
204.152.65.0 204.152.65.1 U 1 0 scid0
192.168.10.0 neva U 1 1 hme0
224.0.0.0 neva U 1 0 hme0
default gateway UG 1 0
localhost localhost UH 2 6 lo0
neva:/> netstat -i
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue
lo0 8232 loopback localhost 489 0 489 0 0 0
hme0 1500 neva neva 666 0 413 0 0 0
scid0 16321204.152.65.0 204.152.65.1 0 0 103 0 0 0
scid1 16321204.152.65.16 204.152.65.17 0 0 103 0 0 0
3) Interface plumbing error
4) Routing error
5) Scrubber error
-------------------------------------------------------------------------------------
Veritas Configuration
export the DISPLAY variable.
type the following commands on the cluster console.
# xhost + neva
neva being added to access control list
# xhost + volga
volga being added to access control list