SAM-FS, SAM-QFS, and QFS File System Planning

 

The SAM-FS/QFS software provides two separate values: archiving and file system performance. In this section we will focus on those aspects of file system performance relevant to file system configuration.

 

Why you need to know about performance

 

The QFS file system is used when data must be read and written very rapidly. It is fundamentally more efficient than a UFS file system, and it is also highly configurable; it can provide extremely high performance when correctly configured. The converse is also true. If the QFS file system is improperly configured, it can waste an enormous amount of disk space or it can provide mediocre write performance. Setting up the QFS file system for best performance is largely the task of the system architect, and much of this discussion is therefore aimed at Sun, Oracle and other architects who implement solutions including QFS.

 

Those of you who manage and administer production systems on which QFS is implemented must also understand QFS performance. You will be providing the information the architect needs to produce a solution. That solution will be only as good as the information on which it is based.  You may be handling the system shakedown, and will need to understand the results. You will be monitoring the system for problems after it goes into production, so you will need to know how the QFS file system should work so you can tell if it isn’t performing optimally. You will be managing changes in the use of the system including the creation of new file systems. If you do not understand the QFS file system and what affects its performance, you cannot do these tasks well.

 

Understanding the performance of any file system requires:1)  Knowledge of the file system metadata and of the read/write characteristics of the file system 2) Knowledge of the intended uses of the file system 3) Understanding how the file system  characteristics interact with the intended use of the file system.  In this paper, we will look at the file system metadata and the most basic read/write characteristics of the file system. In the QFS paper, we will look at more complex issues of performance tuning.

 

The File System

 

All disk-based file systems employ kernel modules that are implemented as part of the operating system kernel. The job of a file-system kernel module is to read from and write to the disk devices used by the file system. These kernel modules expect specific disk layouts of data and of metadata. Metadata is commonly defined as “data about data” and may be held in files such as directories and symbolic links, or in control structures such as inodes and superblocks.

 

The Sun StorageTek SAM and QFS software includes a file-system kernel module called samfs. It can be viewed in the output of the command modinfo. This kernel module is able to communicate with the disk layouts for two file systems; the basic file  system (FS) and the high performance file system (QFS). The basic file system is used in the SAM-FS configuration, and the high-performance file system is used with the SAM-QFS and QFS configurations. The disk layout to be placed on hardware and the type of communication between the kernel module samfs and the disk devices is specified in the master configuration file /etc/opt/SUNWsamfs/mcf and nowhere else. This file contains an administrator-selected name for each file system, a list of disk devices included in each, directives for using those disk devices, and a specification called an equipment type identifier which indicates whether the file system is to be set up as a QFS file system or a SAM-FS file system.

 

The disk-based file system used by SAM-QFS is the same as that used by QFS. The SAM-QFS file system configuration applies the archiving software also used in SAM-FS to a QFS file system.

 

The use of archiving is the only difference between a SAM-QFS file system and a QFS file system. A QFS file system configured on a host system not running the archiving software (that is, SUNWqfsr amd SUNWqfsu are installed) does not archive by default. A QFS file system configured on a host running archiving software (that is, SUNWsamfsr and SUNWsamfsu are installed), automatically archives/releases/stages unless the “nosam” option to mount is specified when the file system is mounted. Such a QFS file system can be transformed into a SAM-QFS file system by unmounting it and remounting it without the "nosam" option. Archiving alone can be disabled on a SAM-QFS file system by unmounting it and remounting it with the “noarscan” mount option. In that case staging and releasing will continue to work.

 

It is possible to set up a SAM-FS file system without the "SAM," as the archiving software is separate from the file system kernel module. This is a very expensive way to set up a file system with few more features than the UFS file system that comes free with the Solaris OS, and it is not supported.

 

Metadata

 

The SAM-FS, SAM-QFS, and QFS file systems differ from the familiar UFS file systems in the type of metadata collected and stored, the locations in which the metadata is stored, and the way writes occur to the file system.

 

The Known: UFS

 

In a UFS file system, metadata consists of the contents of inodes, superblocks, cylinder group blocks, and directories. It also includes symbolic links. UFS file system metadata includes information about the location of the file system on the physical disk and the amount of disk space allocated to the file system, and the usage and location of data blocks. UFS metadata is always held on the same logical device as the file data, in portions of the disk device reserved for metadata when the file system is created. UFS metadata are interspersed with data blocks on the disk.

 

The inode metadata attached to each file and directory in the traditional UFS file system includes: 1) File size 2) Modification (ls -l), access (ls -lu)  and inode change (ls -lc) times 3) File ownership information 4) Permissions and 5) Pointers to data blocks containing actual file data. Much of this information can be displayed with the various options of the UNIX command ls. UFS inodes are 128 bytes each and are grouped together in control blocks set up at the time the file system is created. As a result, when you run out of inodes on a UFS file system, you cannot add more files. UFS file systems write to the disk in “data blocks” of 8k each. Data blocks are the minimum amount of disk space normally allocated to a write, and are therefore the minimum size of a file. Data blocks are the UFS version of a Disk Allocation Unit or DAU. A Disk Allocation Unit is the size of an I/O write to a file system.

 

UFS file systems are not particularly efficient, especially for large writes. They can be configured to improve performance, but the basic control structures associated with UFS cylinder groups, namely the backup superblocks, cylinder group blocks and inode table, are scattered throughout the disk, so the disk's write heads must be repositioned multiple times if a write extends past a single cylinder group. The need to update at least three control structures with every disk write also slows down the write process, and these updates cannot be done in parallel with data writes, since metadata shares disk space with data. If you want good performance for large writes and reads, it makes sense to use a file system with a simpler disk layout than that of a UFS file system, and even to segregate metadata from data so that metadata and data can be updated in parallel. The 8k write size of the UFS file system made sense at one time, but for writes of the large files commonly used now, they use up an unnecessarily large amount of processing overhead.

 

The New: SAM-FS/QFS File Systems

 

The basic and high performance disk-based file systems available with the SAM-FS product have simplified metadata. The only major file system control structures for the disk-based file system are superblocks, located at the beginning of the disk, and inodes, held in a file called .inodes and located under the file system mount point. They also have DAUs of configurable size, which allow you to tune the performance of your file system writes to your file characteristics.

 

Like UFS file systems, SAM-FS and QFS file systems record disk space usage and other metadata in a primary superblock. The SAM-FS/QFS primary superblock is located at the physical beginning of the primary disk device, which is the first disk device listed in the mcf file for that file system. This superblock contains such information as the file system type (basic or high performance), and specifies the hardware that makes up the file system. It also holds the DAU size and may also contain information about shared file systems (discussed later.)  Disk devices other than the primary disk device have a secondary superblock that contains only a small subset of the information of the primary superblock.

 

The high-performance (QFS) file system separates data and metadata onto different disk devices. It places the .inodes file, directories, symbolic links, bitmaps and other metadata on a separate disk device from the data. This separation of data and metadata is the primary performance enhancement of the QFS file system. The only metadata on QFS data disks is a secondary superblock at the beginning of the disk so nothing interferes with continuous, efficient writes to the disk.

 

As noted earlier, the inodes on SAM-FS, SAM-QFS, and QFS file systems are held in a file called ".inodes" which is held in ordinary data blocks like any other file. The size of this file is therefore effectively unlimited. As long as there is space on the disk, space in the file for new inodes is dynamically allocated as needed. Each SAM-FS or QFS inode is 512 bytes in this file, in contrast with UFS inodes that are 128 bytes in an inode table. The added space is used to hold SAM-specific file used by the SAM-FS and SAM-QFS file system implementations. The .inodes file can hold a theoretical limit of 232 inodes (about 4 billion), but Sun Support recommends you plan to have no more than 10 million files per file system.

 

On the SAM-FS file system, the .inodes file resides on the same disk devices as data. In contrast, on the QFS and SAM-QFS file systems the .inodes file is held on the metadata disk device where directories and other metadata also reside.  Placing metadata on a separate metadata device allows parallel access to metadata and data, which improves file read and write performance, especially in read-intensive file systems where metadata is frequently accessed. The separation of metadata and data is the primary performance advantage of the QFS file system.

 

A metadata disk device can be a LUN, a VERITAS® volume, or a Solaris Volume Manager™ (Solstice DiskSuite™) volume (metadevice). The type of device used for metadata in a QFS file system need not be the same as that used for data; for example, a typical QFS configuration places metadata on a mirror, while data is on a hardware RAID-5.

 

Inodes in SAM-FS and SAM-QFS file systems contain all the information included in UFS inodes and the additional file attributes used by SAM-FS, SAM-QFS, and QFS. When all configured archive copies of a file are made the archdone flag is set in the inode, as are the locations, creation times and dates of the file’s archive copies. The SAM-FS, SAM-QFS, and QFS inodes also store information about the current residence of the file - whether it is still on disk or has been released. In addition to the access, modification, and inode change times included in UFS inodes, SAM-FS inodes contain the time when the file was created, when SAM-FS, SAM-QFS, or QFS attributes were last changed, and when the file was last released or staged. This last value is called the "residence time."  If attributes related to archiving, releasing, and staging have been set on the file, that information is stored in the inode. For SAM-FS, SAM-QFS, and QFS files, the command ls displays the same inode information as for UFS inodes, but SAM-FS attributes in the inode must be viewed with the command sls.

 

The sls Command

 

The sls command is the Sun extension to the GNU version of the ls command with added features to support the SAM-FS and QFS software. For the most part, the sls command has the same syntax and behavior as the ls command. For example, the option -d works the same way with sls as it does with ls.

 

The most important additional option to sls is the -D option that lists a detailed description of the metadata for each file. The output includes the regular Unix metadata such as link count, and also SAM file attributes such as file creation time and archive copy information in this format:

 

# sls -D maryann

maryann:

mode:  -rw-rw---- links: 1  admin: 0   owner: gregor    group: sam                (1)

length:             8254959                      inode: 12                                             (2)

offline; archdone;                                                                                           (3)

copy 1:            ----- Dec 19 17:08       64c7b.1           lt          CFX123          (4)

copy 2:                       ----- Dec 19 17:17       3c04a.1           lt          CFX124            (4)

access:             Dec 19 16:54               modification:               Dec 19 16:54   (5)

changed:          Dec 19 16:54               attributes:                    Dec 19 17:17   (5)

creation:          Dec 19 16:54               residence:                    Dec 19 17:17   (5)

 

This example lists the metadata for a single file, maryann.

 

1. The mode, link count, owner, and group owner of the file are listed under the file’s name in the second line of output. The admin entry refers to a SAM administrative group.

2. The size of the file in bytes and the file’s inode number are listed in the third line of output.

3. Two keywords in the output, archdone and offline, indicate that this file is completely archived (archdone) and has been released (offline). Other keywords discussed later may also appear in this line of output.

4. The file shown in this example has two archive copies. The output of sls -D provides the time and date when these copies were made (Dec 19, 17:08 and 17:17), a notation indicating the location of the archive copy on the archive media (64c7b.1 and 3c04a.1 - details discussed later), the equipment type identifier of the media type on which the copy is stored (lt , also discussed later) and VSNs (volume serial name - CFX123 and CFX124) of the media on which the archive copies reside. The Disaster Recovery module discusses the use of the request command with an archive copy location such as those included in the output of sls -D to read the archive copy of a file directly from storage media.

5.  The last three lines of output list the times when the file was last accessed and last modified, and when the Unix attributes stored in the file inode last changed. These are the same values included in UFS inodes. SAM inodes also include when the file was created, the last time that SAM-FS attributes on the file changed, and when the file last changed residence—in this case, when it was released.

 

The attribute change time of the file changes every time the file is released or staged. It also changes if archive -n, stage -n, release -n, or any other SAM-FS attribuute is set or re-set on the file. It is not affected by archiving or rearchiving.

 

File System Read/Write Characteristics

 

So far we have discussed the layout and metadata of the SAM-FS/QFS file system. The file system’s performance will also be determined by the write and read behavior of the file system module samfs. In the next section we will discuss the way that the SAM-FS/QFS file systems are read and written. SAM-FS/QFS writes and reads depend dominantly on three factors: 1) Whether the file system is basic (FS) or high-performance (QFS) 2) The disk device type identifier assigned to each device in the file system by the administrator 3) The DAU assigned to the file system. All these factors are configurable, and the selected configuration of each is written to the superblock. 

 

The DAU is assigned when the file system is initialized. The type of file system and the disk device identifiers are assigned in the mcf file (/etc/opt/SUNWsamfs/mcf) using an equipment type identifier. The equipment type identifiers for disk devices determine how the SAM-FS/QFS file system will write to the devices. These must be thoughtfully selected to support the planned usage of the file system.

 

Equipment Type Identifiers

 

The allocation scheme and allowable DAU sizes of a particular SAM-FS/QFS file system depend on the equipment type identifier assigned in the mcf file to the disk devices used by the file system. An equipment type identifier is a two to four letter specifier assigned to each file system, disk device and mass storage device in the file /etc/opt/SUNWsamfs/mcf. Every file system has one of two possible equipment type identifiers; every disk device has one of four possible equipment type identifiers.

 

Each disk device in a SAM-FS or QFS file system is normally assigned the equipment type identifier mm, md, or mr. The mm devices hold the metadata for a QFS (not SAM-FS) file system. Each QFS file system must have one or more mm devices. In addition to the mm device, a single QFS file system can use exclusively md or exclusively mr devices for data. A single file system cannot combine data disk device types. File systems may also use a type of disk device called a “stripe group.” These devices are named gXXX, where XXX is replaced by an integer. Stripe groups are discussed in the QFS Performance Tuning paper.

 

Each file system is assigned an equipment type identifier in the mcf file. This identifier indicates how the disk should be read and written by the samfs kernel module: as the basic file system (ms) or as the high-performance file system (ma).

 

 

 

Equipment Type Identifier

File System Type

File System Name

ms

Basic file system

SAM-FS

ma

High-performance file system

SAM-QFS or QFS

 

For disk devices, the administrator sets the equipment type as discussed below based on how the equipment will be used. The type of disk, the disk vendor or the type of volume (disk slice, SVM volume, VERITAS volume, LUN) is irrelevant to the selection of the disk equipment type identifier. Only the intended usage of the device determines its equipment type.

 

For mass storage hardware devices, such as tape drives or robots, the equipment type identifier is based on the manufacturer and model of the hardware. For example, DLT tape drives are always designated type lt. See the man page for the mcf file for a complete list of equipment type identifiers.

 

Disk Allocation Units

 

In order to set our disk device types for best performance, we first have to understand how the file system will write to disks based on the disk device type and on the type of file system (SAM-FS or QFS).

 

File systems write data to disk in blocks of a pre-determined size. Every file therefore takes up an amount of space on the disk that is an integral multiple of the size of the blocks in which the file system writes. If the size of the blocks is 16 kilobytes (Kbytes), a file of 17 Kbytes takes up 32 Kbytes of disk space, because two 16-Kbyte blocks are required to store that file. Similarly, on a file system that writes in 512 Kbyte blocks, a 17 Kbyte file takes up 512 Kbytes, wasting 495 Kbytes.

 

SAM-FS, SAM-QFS and QFS file systems write to the disk in blocks called DAUs. The size of SAM-FS and QFS DAUs can be configured to one of a set of permitted values at the time the file system is initialized, unlike the equivalent “disk block” in the UFS file system, which is always 8 Kbytes. Such variable DAUs allow the administrator to performance tune the file system to match application I/O and disk configuration.

 

SAM-FS, SAM-QFS and QFS file systems can also implement a dual allocation scheme, in which the first eight writes of a file go to a DAU fragment of 4 Kbytes, for a total of 32 Kbytes, and the remainder is written to whole DAUs. The dual allocation scheme prevents small files from wasting disk space and large files from wasting device overhead. Small files are written to small blocks, and large files are written in large blocks. SAM-QFS and QFS file systems can alternatively implement a single allocation scheme in which only whole DAUs are written. This scheme is optimal for a file system of large files of relatively consistent size.

 

The mm Devices

 

SAM-QFS and QFS metadata is always held on a device designated type mm in the mcf file. Each SAM-QFS or QFS file system must have at least one mm device for metadata storage. The mm devices implement the dual allocation scheme, so the first 32 Kbytes of a file are written to eight DAU fragments of 4 Kbytes each, while additional data is written in whole DAUs of 16 Kbytes. There is no configuration of the DAUs on mm devices. They are always 16 Kbytes in size.

 

The md Devices

 

SAM-FS file systems store metadata and file data on the same device, so the .inodes file, directories, and symbolic links share disk space with data files. The SAM-FS file system can therefore only write to md disk devices, which allow mixed data and metadata storage. For most releases of the software, the default DAU for SAM-FS file systems is 16 Kbytes, but, based on anticipated file system usage, may be configured to 32 or 64 Kbytes for all devices in a file system when the file system is initialized. (In release 4.5 and later of the software, the default DAU for SAM-FS file systems is 64k).The md devices use a dual DAU, where the first 32 Kbytes of a file are written in 4 Kbyte fragments of a DAU, and the rest of the file is written in whole DAUs.

 

SAM-QFS and QFS file systems may also use md devices to store file data but metadata for SAM-QFS and QFS file systems is always held on a separate mm device. The default DAU for shared SAM-QFS and QFS file systems on md devices is 64 Kbytes, but they can also be configured with a DAU of 16 or 32 Kbytes. The md devices cannot be configured for DAUs of greater than 64 Kbytes, and they always use a dual allocation scheme for writes. These characteristics are optimal for file systems used to store smaller files or files of widely varying sizes.

 

The mr Devices

 

SAM-QFS and QFS file systems may also use mr disk devices for data storage. The administrator can freely configure the DAU for mr devices between 16 and 65,528 Kbytes in 8 Kbyte increments. The default DAU is 64 Kbytes. Like md devices, the size of the DAU written to mr devices is configured for all devices in a file system when the file system is initialized. The mr devices use the single allocation scheme, so each file written to the device occupies a minimum of one DAU and all writes to a file system using mr devices are performed exclusively in increments of one DAU. The large and variable DAUs and single allocation scheme used by mr devices allow the administrator to customize file system behavior to the size of writes produced by applications. This type of disk device is most useful in a file system storing large files of uniform size. If you are using hardware RAID-5 for disk cache, the DAU size must be set equal to an integral multiple of the single stripe write size of the RAID or write performance will suffer.

 

Appendix: Superblock Versions

 

Two versions of the superblock are currently in use. Version 1 was used in releases of SAM-FS prior to 4.0. Version 2 is used by Release 4.0 and higher of the software and supports advanced features such as Access Control Lists (ACLs). File systems created on SAM-FS 3.x releases (i.e. which have Version 1 superblocks) may be mounted on systems running Release 4.2 and earlier of the software, but Release 4.3 of the software is only compatible with Version 2 of the superblock. File systems created on Release 3.x of SAM-FS therefore may not be mounted on a system running Release 4.3, or later of the software. Such file systems must be rebuilt if they are to be used with Release 4.3 or later of the software.

 

Release

3.x

4.0

4.1

4.2

4.3-

Year Released

Pre-2002

2002

2004

2004

2005

Superblock Version

1

2

2

2

2

Compatible with Superblock Versions:

1

1,2

1,2

1,2

2

 

 

Hosted by www.Geocities.ws

1