The
SAM-FS/QFS software provides two separate values: archiving and file system
performance. In this section we will focus on those aspects of file system
performance relevant to file system configuration.
Why you need to know about
performance
The QFS file system is used when data must be read and
written very rapidly. It is fundamentally more efficient than a UFS file
system, and it is also highly configurable; it can provide extremely high
performance when correctly configured. The converse is also true. If the QFS
file system is improperly configured, it can waste an enormous amount of disk
space or it can provide mediocre write performance. Setting up the QFS file
system for best performance is largely the task of the system architect, and
much of this discussion is therefore aimed at Sun, Oracle and other architects
who implement solutions including QFS.
Those
of you who manage and administer production systems on which QFS is implemented
must also understand QFS performance. You will be providing the information the
architect needs to produce a solution. That solution will be only as good as
the information on which it is based.
You may be handling the system shakedown, and will need to understand
the results. You will be monitoring the system for problems after it goes into
production, so you will need to know how the QFS file system should work so you
can tell if it isn’t performing optimally. You will be managing changes in the
use of the system including the creation of new file systems. If you do not
understand the QFS file system and what affects its performance, you cannot do
these tasks well.
Understanding
the performance of any file system requires:1)
Knowledge of the file system metadata and of the read/write
characteristics of the file system 2) Knowledge of the intended uses of the
file system 3) Understanding how the file system characteristics interact with the intended
use of the file system. In this paper,
we will look at the file system metadata and the most basic read/write
characteristics of the file system. In the QFS paper, we will look at more
complex issues of performance tuning.
The File System
All disk-based file systems employ kernel modules that are
implemented as part of the operating system kernel. The job of a file-system
kernel module is to read from and write to the disk devices used by the file
system. These kernel modules expect specific disk layouts of data and of metadata. Metadata is commonly defined
as “data about data” and may be held in files such as directories and symbolic
links, or in control structures such as inodes and superblocks.
The Sun StorageTek SAM and QFS software includes a
file-system kernel module called samfs.
It can be viewed in the output of the command modinfo. This kernel module is able to communicate with the disk
layouts for two file systems; the basic file system (FS) and the high performance file
system (QFS). The basic file system is used in the SAM-FS configuration, and
the high-performance file system is used with the SAM-QFS and QFS
configurations. The disk layout to be placed on hardware and the type of
communication between the kernel module samfs and the disk devices is specified
in the master configuration file /etc/opt/SUNWsamfs/mcf and nowhere else.
This file contains an administrator-selected name for each file system, a
list of disk devices included in each, directives for using those disk devices,
and a specification called an equipment type identifier which indicates whether the file system
is to be set up as a QFS file system or a SAM-FS file system.
The disk-based file system used by SAM-QFS is the same as
that used by QFS. The SAM-QFS file system configuration applies the archiving
software also used in SAM-FS to a QFS file system.
The use of archiving is the only difference between a
SAM-QFS file system and a QFS file system. A QFS file system configured on a
host system not running the archiving software (that is, SUNWqfsr amd
SUNWqfsu are installed) does not archive by default. A QFS file system
configured on a host running archiving software (that is, SUNWsamfsr and
SUNWsamfsu are installed), automatically archives/releases/stages unless the
“nosam” option to mount is specified when the file system is mounted. Such a
QFS file system can be transformed into a SAM-QFS file system by unmounting it
and remounting it without the "nosam" option. Archiving alone can be
disabled on a SAM-QFS file system by unmounting it and remounting it with the
“noarscan” mount option. In that case staging and releasing will continue to
work.
It is possible to set up a SAM-FS file system
without the "SAM," as the archiving software is separate from the
file system kernel module. This is a very expensive way to set up a file system
with few more features than the UFS file system that comes free with the
Solaris OS, and it is not supported.
The SAM-FS, SAM-QFS, and QFS file systems differ from the
familiar UFS file systems in the type of metadata collected and stored, the
locations in which the metadata is stored, and the way writes occur to the file
system.
The Known: UFS
In a UFS file system, metadata consists of the contents of
inodes, superblocks, cylinder group blocks, and directories. It also includes
symbolic links. UFS file system metadata includes information about the
location of the file system on the physical disk and the amount of disk space
allocated to the file system, and the usage and location of data blocks. UFS metadata
is always held on the same logical device as the file data, in portions of the
disk device reserved for metadata when the file system is created. UFS metadata
are interspersed with data blocks on the disk.
The inode metadata attached to each file and directory in
the traditional UFS file system includes: 1) File size 2) Modification
(ls -l), access (ls -lu)
and inode change (ls -lc)
times 3) File ownership information 4) Permissions and 5) Pointers
to data blocks containing actual file data. Much of this information can be
displayed with the various options of the UNIX command ls. UFS inodes are 128 bytes each and are grouped together in
control blocks set up at the time the file system is created. As a result, when
you run out of inodes on a UFS file system, you cannot add more files. UFS file
systems write to the disk in “data blocks” of 8k each. Data blocks are the
minimum amount of disk space normally allocated to a write, and are therefore
the minimum size of a file. Data blocks are the UFS version of a Disk Allocation Unit or DAU. A Disk Allocation Unit is the size
of an I/O write to a file system.
UFS file systems are not
particularly efficient, especially for large writes. They can be configured to
improve performance, but the basic control structures associated with UFS
cylinder groups, namely the backup superblocks, cylinder group blocks and inode
table, are scattered throughout the disk, so the disk's write heads must be
repositioned multiple times if a write extends past a single cylinder group.
The need to update at least three control structures with every disk write also
slows down the write process, and these updates cannot be done in parallel with
data writes, since metadata shares disk space with data. If you want good
performance for large writes and reads, it makes sense to use a file system
with a simpler disk layout than that of a UFS file system, and even to
segregate metadata from data so that metadata and data can be updated in
parallel. The 8k write size of the UFS file system made sense at one time, but
for writes of the large files commonly used now, they use up an unnecessarily
large amount of processing overhead.
The New: SAM-FS/QFS File Systems
The basic and high performance
disk-based file systems available with the SAM-FS product have simplified
metadata. The only major file system control structures for the disk-based file
system are superblocks, located at the beginning of the disk, and inodes, held
in a file called .inodes and located under the file system mount point. They
also have DAUs of configurable size, which allow you to tune the performance of
your file system writes to your file characteristics.
Like UFS file systems, SAM-FS and QFS file systems record
disk space usage and other metadata in a primary superblock. The SAM-FS/QFS
primary superblock is located at the physical beginning of the primary disk
device, which is the first disk device listed in the mcf file for that file
system. This superblock contains such information as the file system type
(basic or high performance), and specifies the hardware that makes up the file
system. It also holds the DAU size and may also contain information about
shared file systems (discussed later.) Disk devices other than the primary disk
device have a secondary superblock that contains only a small subset of the
information of the primary superblock.
The high-performance (QFS) file
system separates data and metadata onto different disk devices. It places the
.inodes file, directories, symbolic links, bitmaps and other metadata on a
separate disk device from the data. This
separation of data and metadata is the primary performance enhancement of the
QFS file system. The only metadata on QFS data disks is a secondary superblock
at the beginning of the disk so nothing interferes with continuous, efficient
writes to the disk.
As noted earlier, the inodes on SAM-FS, SAM-QFS, and QFS
file systems are held in a file called ".inodes" which is held in
ordinary data blocks like any other file. The size of this file is therefore
effectively unlimited. As long as there is space on the disk, space in the file
for new inodes is dynamically allocated as needed. Each SAM-FS or QFS inode is
512 bytes in this file, in contrast with UFS inodes that are 128 bytes in an inode
table. The added space is used to hold SAM-specific file used by the SAM-FS and
SAM-QFS file system implementations. The .inodes file can hold a theoretical
limit of 232 inodes (about 4 billion), but Sun Support recommends
you plan to have no more than 10 million files per file system.
On the SAM-FS file system, the .inodes file resides on the
same disk devices as data. In contrast, on the QFS and SAM-QFS file systems the
.inodes file is held on the metadata disk device where directories and other
metadata also reside. Placing metadata on a separate metadata device allows
parallel access to metadata and data, which improves file read and write
performance, especially in read-intensive file systems where metadata is
frequently accessed. The separation of
metadata and data is the primary performance advantage of the QFS file system.
A metadata disk device can be a LUN, a VERITAS®
volume, or a Solaris Volume Manager™ (Solstice DiskSuite™) volume (metadevice).
The type of device used for metadata in a QFS file system need not be the same
as that used for data; for example, a typical QFS configuration places metadata
on a mirror, while data is on a hardware RAID-5.
Inodes in SAM-FS and SAM-QFS file systems contain all the
information included in UFS inodes and the additional file attributes used by
SAM-FS, SAM-QFS, and QFS. When all configured archive copies of a file are made
the archdone flag is set in the inode, as are the locations, creation times and
dates of the file’s archive copies. The SAM-FS, SAM-QFS, and QFS inodes also
store information about the current residence of the file - whether it is still
on disk or has been released. In addition to the access, modification, and
inode change times included in UFS inodes, SAM-FS inodes contain the time when
the file was created, when SAM-FS, SAM-QFS, or QFS attributes were last
changed, and when the file was last released or staged. This last value is
called the "residence time."
If attributes related to archiving, releasing, and staging have been set
on the file, that information is stored in the inode. For SAM-FS, SAM-QFS, and
QFS files, the command ls displays
the same inode information as for UFS inodes, but SAM-FS attributes in the
inode must be viewed with the command sls.
The sls Command
The sls command
is the Sun extension to the GNU version of the ls command with added features to support the SAM-FS and QFS
software. For the most part, the sls
command has the same syntax and behavior as the ls command. For example, the option -d works the same way with sls
as it does with ls.
The most important additional option to sls is the -D option that lists a detailed description of the metadata for
each file. The output includes the regular Unix metadata such as link count,
and also SAM file attributes such as file creation time and archive copy
information in this format:
# sls -D maryann
maryann:
mode: -rw-rw----
links: 1 admin: 0 owner: gregor group: sam (1)
length: 8254959
inode: 12 (2)
offline; archdone; (3)
copy 1: -----
Dec 19 17:08 64c7b.1 lt CFX123 (4)
copy 2: ----- Dec 19 17:17 3c04a.1 lt
CFX124 (4)
access: Dec
19 16:54 modification: Dec 19 16:54 (5)
changed: Dec
19 16:54 attributes: Dec 19 17:17 (5)
creation: Dec
19 16:54 residence: Dec 19 17:17 (5)
This example lists the metadata for a single file, maryann.
1. The mode, link count, owner, and group owner of the file
are listed under the file’s name in the second line of output. The admin entry
refers to a SAM administrative group.
2. The size of the file in bytes and the file’s inode
number are listed in the third line of output.
3. Two keywords in the output, archdone and offline,
indicate that this file is completely archived (archdone) and has been released
(offline). Other keywords discussed later may also appear in this line of
output.
4. The file shown in this example has two archive copies.
The output of sls -D provides the
time and date when these copies were made (Dec 19, 17:08 and 17:17), a notation
indicating the location of the archive copy on the archive media (64c7b.1 and
3c04a.1 - details discussed later), the equipment type identifier of the media
type on which the copy is stored (lt , also discussed later) and VSNs (volume
serial name - CFX123 and CFX124) of the media on which the archive copies
reside. The Disaster Recovery module discusses the use of the request
command with an archive copy location such as those included in the output of sls -D to read the archive copy of a
file directly from storage media.
5. The last three
lines of output list the times when the file was last accessed and last
modified, and when the Unix attributes stored in the file inode last changed.
These are the same values included in UFS inodes. SAM inodes also include when
the file was created, the last time that SAM-FS attributes on the file changed,
and when the file last changed residence—in this case, when it was released.
The attribute change time of the file changes
every time the file is released or staged. It also changes if archive -n, stage
-n, release -n, or any other SAM-FS attribuute is set or re-set on the file. It
is not affected by archiving or rearchiving.
File
System Read/Write Characteristics
So far we have discussed the layout and
metadata of the SAM-FS/QFS file system. The file system’s performance will also
be determined by the write and read behavior of the file system module samfs.
In the next section we will discuss the way that the SAM-FS/QFS file systems
are read and written. SAM-FS/QFS writes and
reads depend dominantly on three factors: 1) Whether the file system is basic
(FS) or high-performance (QFS) 2) The disk device type identifier assigned to
each device in the file system by the administrator 3) The DAU assigned to the
file system. All these factors are configurable, and the selected configuration
of each is written to the superblock.
The DAU is assigned when the file system is initialized. The type of file system and the disk device identifiers are assigned in the mcf file (/etc/opt/SUNWsamfs/mcf) using an equipment type identifier. The equipment type identifiers for disk devices determine how the SAM-FS/QFS file system will write to the devices. These must be thoughtfully selected to support the planned usage of the file system.
The allocation scheme and allowable DAU sizes of a
particular SAM-FS/QFS file system depend on the equipment type identifier
assigned in the mcf file to the disk devices used by the file system. An
equipment type identifier is a two to four letter specifier assigned to each
file system, disk device and mass storage device in the file
/etc/opt/SUNWsamfs/mcf. Every file system has one of two possible equipment
type identifiers; every disk device has one of four possible equipment type
identifiers.
Each disk device in a SAM-FS or QFS file system is normally
assigned the equipment type identifier mm, md, or mr. The mm devices hold the
metadata for a QFS (not SAM-FS) file system. Each QFS file system must have one
or more mm devices. In addition to the mm device, a single QFS file system can
use exclusively md or exclusively mr
devices for data. A single file system cannot combine data disk device types.
File systems may also use a type of disk device called a “stripe group.” These
devices are named gXXX, where XXX is replaced by an integer. Stripe groups are
discussed in the QFS Performance Tuning paper.
Each file system is assigned an equipment type identifier
in the mcf file. This identifier indicates how the disk should be read and
written by the samfs kernel module: as the basic file system (ms) or as the
high-performance file system (ma).
|
Equipment Type
Identifier |
File System Type |
File System Name |
|
ms |
Basic file system |
SAM-FS |
|
ma |
High-performance file
system |
SAM-QFS or QFS |
For disk devices, the administrator sets the equipment type
as discussed below based on how the equipment will be used. The type of disk,
the disk vendor or the type of volume (disk slice, SVM volume, VERITAS volume,
LUN) is irrelevant to the selection of the disk equipment type identifier. Only
the intended usage of the device determines its equipment type.
For mass storage hardware devices, such as tape drives or
robots, the equipment type identifier is based on the manufacturer and model of
the hardware. For example, DLT tape drives are always designated type lt. See
the man page for the mcf file for a complete list of equipment type
identifiers.
In order to set our disk device types for best performance,
we first have to understand how the file system will write to disks based on
the disk device type and on the type of file system (SAM-FS or QFS).
File systems write data to disk in blocks of a
pre-determined size. Every file therefore takes up an amount of space on the
disk that is an integral multiple of the size of the blocks in which the file
system writes. If the size of the blocks is 16 kilobytes (Kbytes), a file of 17
Kbytes takes up 32 Kbytes of disk space, because two 16-Kbyte blocks are
required to store that file. Similarly, on a file system that writes in 512
Kbyte blocks, a 17 Kbyte file takes up 512 Kbytes, wasting 495 Kbytes.
SAM-FS, SAM-QFS and QFS file systems write to the disk in
blocks called DAUs. The size of SAM-FS and QFS DAUs can be configured to one of
a set of permitted values at the time the file system is initialized, unlike
the equivalent “disk block” in the UFS file system, which is always 8 Kbytes.
Such variable DAUs allow the administrator to performance tune the file system
to match application I/O and disk configuration.
SAM-FS, SAM-QFS and QFS file systems can also implement a
dual allocation scheme, in which the first eight writes of a file go to a DAU
fragment of 4 Kbytes, for a total of 32 Kbytes, and the remainder is written to
whole DAUs. The dual allocation scheme prevents small files from wasting disk
space and large files from wasting device overhead. Small files are written to small blocks, and large files are
written in large blocks. SAM-QFS and
QFS file systems can alternatively implement a single allocation scheme in
which only whole DAUs are written. This scheme is optimal for a file system of
large files of relatively consistent size.
SAM-QFS and QFS metadata is always held on a device
designated type mm in the mcf file. Each SAM-QFS or QFS file system must have
at least one mm device for metadata storage. The mm devices implement the dual
allocation scheme, so the first 32 Kbytes of a file are written to eight DAU
fragments of 4 Kbytes each, while additional data is written in whole DAUs of
16 Kbytes. There is no configuration of the DAUs on mm devices. They are always
16 Kbytes in size.
SAM-FS file systems store metadata and file data on the
same device, so the .inodes file, directories, and symbolic links share disk
space with data files. The SAM-FS file system can therefore only write to md
disk devices, which allow mixed data and metadata storage. For most releases of
the software, the default DAU for SAM-FS file systems is 16 Kbytes, but, based
on anticipated file system usage, may be configured to 32 or 64 Kbytes for all
devices in a file system when the file system is initialized. (In release 4.5 and
later of the software, the default DAU for SAM-FS file systems is 64k).The md
devices use a dual DAU, where the first 32 Kbytes of a file are written in 4
Kbyte fragments of a DAU, and the rest of the file is written in whole DAUs.
SAM-QFS and QFS file systems may also use md devices to
store file data but metadata for SAM-QFS and QFS file systems is always held on
a separate mm device. The default DAU for shared SAM-QFS and QFS file systems
on md devices is 64 Kbytes, but they can also be configured with a DAU of 16 or
32 Kbytes. The md devices cannot be configured for DAUs of greater than 64
Kbytes, and they always use a dual allocation scheme for writes. These
characteristics are optimal for file systems used to store smaller files or
files of widely varying sizes.
SAM-QFS
and QFS file systems may also use mr disk devices for data storage. The
administrator can freely configure the DAU for mr devices between 16 and 65,528
Kbytes in 8 Kbyte increments. The default DAU is 64 Kbytes. Like md devices,
the size of the DAU written to mr devices is configured for all devices in a
file system when the file system is initialized. The mr devices use the single
allocation scheme, so each file written to the device occupies a minimum of one
DAU and all writes to a file system using mr devices are performed exclusively
in increments of one DAU. The large and variable DAUs and single allocation
scheme used by mr devices allow the administrator to customize file system
behavior to the size of writes produced by applications. This type of disk
device is most useful in a file system storing large files of uniform size. If you are using hardware RAID-5 for disk
cache, the DAU size must be set equal to an integral multiple of the single
stripe write size of the RAID or write performance will suffer.
Appendix: Superblock Versions
Two versions of the superblock are currently in use.
Version 1 was used in releases of SAM-FS prior to 4.0. Version 2 is used by
Release 4.0 and higher of the software and supports advanced features such as
Access Control Lists (ACLs). File systems created on SAM-FS 3.x releases (i.e.
which have Version 1 superblocks) may be mounted on systems running Release 4.2
and earlier of the software, but Release 4.3 of the software is only compatible
with Version 2 of the superblock. File systems created on Release 3.x of SAM-FS
therefore may not be mounted on a system running Release 4.3, or later of the
software. Such file systems must be rebuilt if they are to be used with Release
4.3 or later of the software.
|
Release |
3.x |
4.0 |
4.1 |
4.2 |
4.3- |
|
Year Released |
Pre-2002 |
2002 |
2004 |
2004 |
2005 |
|
Superblock Version |
1 |
2 |
2 |
2 |
2 |
|
Compatible with Superblock Versions: |
1 |
1,2 |
1,2 |
1,2 |
2 |