Troubleshooting the QFS file system

 

Summary of Troubleshooting

 

There are five sources of problems with QFS:

 

1) Poor design resulting in bad performance

 

2) Micromanaging the file system or archiving

 

3) Bugs

 

4) Hardware problems

 

5) Networking issues

 

Always start your troubleshooting by checking /var/adm/messages and by reseating all connections. That will solve 90% of your problems.

 

Poor Design

 

Basic data transfer issues: If you configure data transfer greater than 128 Kbytes, you must also increase the value of maxphys. Data transfer can be configured by initializing file systems with large DAUs or by multiplying a DAU by a large QFS stripe width. It is very easy to forget that you have to increase maxphys when you are experimenting with stripe widths, since you can simply unmount and remount the file system with the new stripe width value.

 

Problem: Data alignment has not been properly performed

 

DAUs cannot be changed once a file system has been initialized as they are written into the superblock. If you find that your performance eventually starts to suffer, perhaps because you upgraded to a new version of your application, you can increase stripe width so you write a larger block of data to your devices.

 

Example: Your DAU is 64k and your stripe width=1. To write a larger DAU set stripe=2.

Do NOT change from striping to round robining of data.

 

Problem: Incorrect disk configuration causes poor performance

 

Example: All QFS metadata devices for multiple file systems are placed on the same physical disk. The separation of metadata and data allows parallel reads of data and of metadata. Put four metadata devices on the same physical disk and your metadata access speed drops by a factor of four.

 

Example: Put QFS metadata and data on the same physical disk. You now have to jump between the metadata and data on the disk. Your read heads are going to be constantly repositioning.

 

Problem: poor understanding of metadata leads to configuration problems.

 

Example: You set up a shared QFS file system and initialize the disk. You then realize you put an incorrect IP address in /etc/opt/SUNWsamfs/hosts.<file system name>. So you correct the IP address and run samd config.

 

This will not work. The list of client hosts is written into the superblock when the file system is initialized. When sam-fsd is forced to read a file, the changes will apply to any daemon process, not to control blocks such as the superblock already written on the disk. You must use samsharefs -u <file system name> which does write to the superblock.

 

System Micromanagement

 

QFS is a very simple volume manager. Unlike Veritas Volume Manager, it is not designed for micromanagement. If you experiment with your configuration, moving around disk devices, changing stripe width etc, you can break it disastrously. The number one problem escalated to QFS support: someone changed a configuration file and broke the setup. There are some things you can check to make sure your configuration is still good:

 

The mcf file:

 

# sam-fsd

 

This command will tell you if sam-fsd will accept the contents of the mcf file and what shared file system daemons and tracing daemons it will start. Also check the modification times of all files in /etc/opt/SUNWsamfs to see if any has been changed recently. If one has, you know where to start looking for the problem.

 

The archiver.cmd file:

 

# archiver -lv

 

Bugs

 

1. Check bug reports to see if you are experiencing a known bug

 

2. Check the archives at the SAM-managers listserv:

 

http://lists.ee.ethz.ch

 

Hardware issues

 

Do not rely on samu to tell you if a device is broken.

 

If SAM cannot talk to a device, it reads the mcf file and gives output based on the contents of the mcf file.

 

Check cables, electrical supply etc and always re-seat all connections.

 

If you make any changes that affect mass storage hardware, such as adding a tape library, setting up a library catalog, or configuring SAM-Remote, you must stop and restart the archive media library daemon sam-amld and its subsidiary daemons:

 

# samd stop

# samd config

# samd start

 

This sequence of commands has changed with new releases, and some documentation may be out of date.

 

5) Networking

 

There are no networking problems for the shared QFS file system because QFS functions only at the application layer, and is written to use the TCP protocol, well known port number 7105. So all fragmentation of data, all handshaking, all data control, all interface traffic issues are handled by the Solaris networking protocol stack. If you have networking problems with the QFS file system, look at your network, using snoop, ping, ifconfig, netstat, etc. to diagnose the problem. Firewalls can stop the shared QFS file system. Always check for one if you are trying to set up an existing system as a QFS client.

 

Trace files (discussed below) for sam-sharefsd can provide information on whether a shared QFS file system problem is in the network or in QFS.

 

General tools for troubleshooting:

 

samexplorer - provides a diagnostic report.

 

The command samfsck

 

QFS file systems are not prone to corruption and do not require regular fsck checks. Occasionally corruption may occur. Look for error messages in /var/adm/messages or /var/adm/sam-log complaining of corruption. Look for the EDOM flag and the ENOSPC flag, which may also indicate that you are trying to write files too large for the disk cache. Unmount the file system and run the following command to repair any problems.

 

# samfsck -V -F qfs1

 

Log files

 

Always look in /var/adm/messages.

 

The SAM log file is configured in /etc/syslog.conf. You can log messages from severity level “debug” and above. The detail of these messages can be increased by setting the debug parameter in the defaults.conf file. Look in the man page for defaults.conf(4) details.

 

Look in /var/opt/SUNWsamfs/devlog if you have configuration problems with mass storage devices. It contains a log for each device, named after the device’s equipment ordinal.

 

Trace files

 

Daemon trace files are very useful for troubleshooting. These are files that detail daemon actions and they are all enabled by default. Each action performed by a daemon is written to a trace file in /var/opt/SUNWsamfs/trace that is named for the daemon. For example, the actions of the shared QFS file system daemon sam-sharefsd are written to /var/opt/SUNWsamfs/trace/sam-sharefsd . Don’t leave these on all the time, because they waste overhead.  

 

To enable tracing for sam-sharefsd and disable it for all other daemons, copy  /opt/SUNWsamfs/examples/defaults.conf to /etc/opt/SUNWsamfs/defaults.conf, and change the lines between the keywords “trace” and “endtrace” to the following.

 

trace
sam-sharefsd = on
all.size = 10M
endtrace
 
To turn off daemon tracing for all daemons, change the lines between the keywords “trace” and “endtrace” in /opt/SUNWsamfs/examples/defaults.conf to the following:
 
trace
all = off
all.size = 10M
endtrace
 
The option all.size = 10M  causes all trace files will grow until they reach the size of 10 Mbytes, after which SAM-FS/QFS will roll the log. Other options are documented in the man page for defaults.conf(4).
 
Re-read defaults.conf:
# samd config
   
Run sam-fsd to confirm trace is on for the daemons specified.
 # sam-fsd
 
 
  Would start sam-sharefsd(qfs3)
  Trace file controls:
 
  sam-sharefsd  /var/opt/SUNWsamfs/trace/sam-sharefsd
              cust err fatal misc proc date
              size    0    age 0
 
 

SAM-FS filesystem tracing

Don't confuse daemon tracing and SAM-FS filesystem tracing. Daemon tracing is probably the most useful troubleshooting tool in SAM-QFS. File system tracing provides a lot of obscure information that is useful only to Sun Support.


To enable filesystem tracing, use the -o trace option to mount. For more details refer to this document which is not generally available to the public

Document ID: 80394
Title: Sun StorEdge[tm] SAM-FS: How to turn file system tracing on or off
 
 
SNMP
 
Most of the problems monitored by the MIB are related to the hierarchical storage management software and provide information on the condition of tape libraries and tape drives. For complete information about the MIB, see /opt/SUNWsamfs/mibs/SUN-SAM-MIB.mib. You configure remote notification in /etc/opt/SUNWsamfs/scripts/sendtrap. Remote notification is enabled by default - you need only configure the destination of traps and the community string. See the man page for details. 

 

Hosted by www.Geocities.ws

1