Introduction to Archiving Concepts

 

Archiving is the process of copying a file from a SAM-FS or SAM-QFS file system to archive media. When files on SAM-FS and SAM-QFS file systems are created or modified, the archiving process identifies those files, makes one to four copies of them and sends them to tape, magneto-optical disk or an online disk archive.

 

This section discusses the archive set and other archiving concepts that underpin the archiving process. It also covers background information including the archive command, archiving daemons, VSN associations, the timing of archiving, issues with directory archiving, and archiving log files.

 

The Underlying Issues of Archiving

 

In SAM-QFS, file systems should be used to organize files by write characteristics, as discussed in the document SAM-FS and QFS File System Basics. For disk efficiency, large, sequentially written files will be written to different file systems than small, randomly written files. When files are archived to tape, the performance need is not for fast writes, but for efficient staging, so files that were written to different file systems may well be archived to the same tape cartridge. For example, a large data file will probably be accompanied by smaller files containing descriptions of the data, location information or analyses. These files will certainly be written to different file systems, but it would make sense to archive them to the same tape. That way when the data is analyzed, all the information needed by the analyst will be on one tape, and can be staged with a single load operation.

 

Obviously many files should be archived using an organizing principle based on staging requirements. This organizing principle is the archive set. An archive set is a group of files designated by location in their respective file systems, and optionally by size, by name or by ownership. Files in a single archive set will typically be selected from multiple file systems but will be archived to one piece or to a limited set of media. Files that should be archived together will go into the same archive set, regardless of the file system to which they were written. By configuration of the archive set, and manipulation of the media in the library, you can control which files are archived together. Every file will be archived with one and only one archive set.

 

Archiving is highly configurable using the file /etc/opt/SUNWsamfs/archiver.cmd. In this file it is possible to specify archive set membership, the number of archive copies (up to four) made the files in each archive set, the media to which those copies should be sent, which files should be archived together, and when files are archived. If no archiver.cmd file exists, by default, the archiver places each file in a default archive set, and makes one copy of each newly created or modified file to any removable media.

 

The configuration of your archive sets will largely determine whether your SAM-FS/SAM-QFS system functions well or poorly. In particular, your configuration must address the following three major issues:

 

 

On an active production system with tens of millions of files, it is not possible to control exactly when and where and how any individual archive copy will be staged or written to media. You can only control the way your system behaves overall by manipulating the probabilities of individual files being archived in particular ways. This allows you to ensure that archiving occurs promptly, that you have as many archive copies as you need, and that drives are available as they are needed.

 

The archive Command

 

The archive command allows you to initiate archiving on files which otherwise would not be archived until a later time. It is also used to set archive attributes on files. The command archive used without options forces archiving of any undone copies of specified files. The command used with the -r <dirname> option forces archiving recursively on files in the specified directory. You cannot chose which archive copy number is archived; archiving will commence on any unarchived copies.

 

To set archive attributes on files or to archive one or more files, use the following syntax and options:

archive [-d] [-n] filename…

archive [-d] [-n] -r dirname...

 

The archive command used with the -n option sets the no archive flag on the file. This attribute can be set recursively on all files in a directory by using the -r and -n options and specifying one or more directories. The attribute is inherited by any files subsequently added to the directory. The option -d clears all archiving attributes, and may also be used recursively on a directory if accompanied by the -r <dirname> option.

 

Archive Sets

 

Your SAM-FS and SAM-QFS file systems probably contain a mix of files. Some may need to be archived as soon as any change is made, while others should never be backed up. You may want to keep just one archive copy of unimportant or frequently changed files while irreplaceable data might require four archive copies sent to different media. Files in archiving file systems are organized into groups called archive sets whose archive properties can then be individually set. Thus files that need to be archived once, immediately, to any available tape can be placed into one archive set, while files that must be archived four times to four different tapes can be placed in another.  The files in one archive set may come from any of the file systems on the host system and can have different selection criteria in different file systems. For example, files beginning with 27jan and ending in .bmp on one file system can be assigned to the same archive set as files that begin with 27jan and end with .jpg on another. Related files that were separated into different file systems because of their disk write characteristics can thus be reunited on tape.

 

Archive sets are configured by the administrator in the file /etc/opt/SUNWsamfs/archiver.cmd.

 

Archive sets therefore help to organize the contents of write operations. This property of archive sets can be used to improve performance when files need to be staged from tape. Files that are usually accessed at the same time should go into the same archive set. Files in that archive set can then be directed to a specified piece or pieces of media. This reduces the need to load and position tapes or magneto-optical disks in the drives when files are staged. 

 

The Default Archive Set

 

By default all files in each file system, including directories and symbolic links, belong to an archive set named for that file system. The files and directories in two SAM-FS file systems named joe and maryann are automatically included in two archive sets also named joe and maryann when the file systems are initialized. The default archive set always exists. It does not have to be configured and cannot be unconfigured. It can only include files, symbolic links and directories from one file system. Each file system has its own default archive set.

 

If you create an archiver.cmd file, you can instead assign some or all of your regular files to individual archive sets of your choice, but the default archive set still includes all directories and symbolic links and it also contains any file not assigned to a different archive set in the archiver.cmd file. No directory or symbolic link can be assigned to any other archive set; directories can only be archived with the default archive set. The default archive set is configured for one archive copy by default, but may be configured for up to four archive copies in the archiver.cmd file, like any other archive set.

 

Configured Archive Sets

 

In the /etc/opt/SUNWsamfs/archiver.cmd file you can configure archive sets. Configured archive sets contain files grouped together by their location in the directory tree. They can be further grouped by size, by file name or by ownership, if desired. Archive properties such as the number, timing, and destination media of archive copies can then be configured in the archiver.cmd file for each archive set, including the default archive set. Configured archiving properties are used by the archiver in place of the default properties.

 

Archive sets configured in the archiver.cmd file must include an administrator-selected name and the criteria (location, size, etc.) that defines the archive set. Each archive set can include as many or as few regular files as you want. The members of an archive set can be restricted to one file system or include files from multiple file systems, and can have different selection criteria in different file systems (though that is usually not sensible). Directories cannot be included in archive sets configured by the administrator in the archiver.cmd file. They are archived only with the default archive set.

 

Archive sets can be declared per-file-system or globally. Archive sets declared globally apply to and have the same selection criteria for all file systems on a host. Those declared per-file-system apply only to specified file systems. Since a single archive set may be declared per-file-system for multiple file systems and will apply to all those for which it is declared, it is never necessary to declare global archive sets, and they generally decrease the legibility of the archiver.cmd file.

 

Special Archive Sets

 

There are two special archive sets: the no_archive archive set and the allsets archive set. Files that should not be archived may be assigned to the no_archive set, which can be configured in any number of file systems, just like any other archive set. The allsets keyword refers to all archive sets, and is used to set specialized configuration parameters that control the behavior of archiving on particular copies of files. For example if you plan to send all tapes containing copy 3 of your files to offsite storage, you may want to prevent those tapes being used  for any other archive set copies.  That configuration can be set for all archive sets, Copy 3, using the allsets keyword. The allsets keyword is used only to set these parameters and is not used in the assignment of files to archive sets.

 

Archive Set Copies

 

By default one copy will be made of each file in each archive set when the file is created or modified, but each archive set can be configured for one to four archive set copies instead. These archive set copies are named with the integers 1, 2, 3 and 4. These values only name the copy; they do not indicate how many archive copies will be made. An archive set with an archive copy "4" configured does not necessarily have four archive copies configured - it may have only copy 4.  Normally you would configure archive copies in order, so an archive set needing only one archive copy made of each file in the set would be assigned archive copy 1, but doing so is not required. Each archive set copy of the files in an archive set can be assigned to a specific media type or to an online disk archive, a specific VSN or a group of VSNs. Other properties can also be set on an archive set copy. Each archive set copy can also be assigned an archive age (discussed later), which partly determines when that archive copy will be made, and an unarchive age, which determines when that archive copy's information will be removed from file inodes.

 

Archive requests are composed of the same archive set copies from the same archive set. For example, if the archive set “memos” is configured for copy 1 and 2 in the file systems samfs1 and samfs2, copy 1 of all members of the archive set “memos,” whether from samfs1 or samfs2, will be archived in the same archive request. Copy 2 of these files will be archived in a separate archive request. Thus there may be multiple active archive requests at the same time for the same archive set - one for each archive copy currently being archived.

 

Archive Set Assignment Priority

 

SAM allows files to belong to only one archive set, and each file must belong to one archive set, however, it is common for files to meet the criteria of more than one archive set. For example, the default archive set for a file system includes all files in that file system. Any file assigned to any configured archive set for in the archiver.cmd file therefore automatically meets the criteria for a minimum of two archive sets.

 

A file is assigned to an archive set using the following rules:

1. It is assigned to the first configured archive set for its file system in the archiver.cmd file whose criteria it meets.

2. If the file does not meet the criteria for any archive set configured for its file system, it is assigned to the first globally configured archive set whose criteria it meets.

3. If the file does not meet the criteria for any configured archive set, it is assigned to the default archive set named for the file system.

4. Directories are always assigned to the default archive set named for the file system.

 

Configuring the “Catch-all” Archive Set *PERFORMANCE ISSUE*

 

The default archive set always contains directories, but only contains those regular files not assigned to another archive set. It is good practice, even for unimportant files, to assign every regular file to a configured archive set in the archiver.cmd file. That way the default archive set will contain only directories, and every file will be archived.

 

Unfortunately it is easy to overlook files when setting up archive sets. Those files will end up archiving with the default archive set. You should therefore configure a “catch-all” archive set that archives any files that would otherwise be archived with the default archive set. Such archive sets are commonly given the name “all”.

 

Files that should not be archived at all may be assigned to an archive set called "no_archive". This archive set is most commonly used for scratch directories containing temporary files (often called temp, tmp or scratch) and may be configured in multiple file systems by file size, name or ownership as well as location, like any other archive set.

 

Archive Set Assignment Example

 

On a server with two SAM-FS file systems named samfs1 and samfs2, you might set up the example archive set assignments shown below, although this is not a particularly well designed archive set assignment (why not? There's one major reason). Configured archive sets are listed in the order of their priority in the archiver.cmd file. The default archive sets appear last because they are always last priority for file assignment.

 

Sample Archive Set Assignments

 

Archive Set

Files Included

File Systems Included

programs

All files in /sam1/development

samfs1

data

Files of more than 1 Mbyte in size

samfs1

samfs2

all

Any files in samfs1

samfs1

samfs1

All directories and files in samfs1.

samfs1

samfs2

All directories and files in samfs2

samfs2

 

The table of Sample Archive Sets shows:

 

1. An archive set called "programs" which contains all the files in /sam1/development, which is located in the samfs1 file system.

2. An archive set called "data" which contains files of more than 1 Mbyte in size in both samfs1 and samfs2 that are not assigned to "programs" (because "data" follows "programs" in the archiver.cmd file, and therefore can include only files not already included in "programs.")

3. A catch-all archive set called "all" which contains any files not assigned to the "programs" and "data" archive sets in the file system samfs1 (because it follows those two archive sets in the archiver.cmd file.)

4. All directories in file system samfs1 are archived with the default file system archive set samfs1. All regular files in samfs1 have already been assigned to other archive sets including the catch-all archive set “all.”

5. All directories and all the files not assigned to the "data" archive set in samfs2 are archived with the default file system archive set samfs2. This is poor practice. Files not assigned to another archive set should be assigned to a catch-all archive set.

 

As the table shows, directories cannot be assigned to an archive set. They are always archived as part of the default archive set named for the file system. The details of declaration of archive sets are discussed later.

 

VSN Associations

 

If no archiver.cmd file is configured, by default archive copies are sent to any available media, which means that you have no control over the eventual destination of your archive copies. All files are archived to the same VSNs and multiple archive copies of the same file may end up on the same piece of media, which defeats the purpose of having multiple archive copies. Associating archive set copies with one piece of media or a limited set of media is a major function of the archiver.cmd file. If you configure an archiver.cmd file at all, you must specify a VSN association in it for every archive copy of every archive set, including those assigned to the default archive set. That means that you must designate a destination - one or more tapes or magneto-optical disks or an online disk archive - for each archive copy of each archive set, including the default archive set. The only exception is the no_archive archive set.

 

It is allowed to associate an archive set copy with a non-existent VSN, and this is very commonly done to reduce changes to the archiver.cmd file.

 

*PERFORMANCE ISSUE* The VSN associations allow you to send each archive copy of an archive set to a different piece of media. The first archive set copy should almost always go to online disk archive. Disk archiving occurs almost instantaneously, so one secure backup of a file is made immediately when copy 1 is sent to disk. This fulfills the requirement that backups of files are made in a timely manner. In addition, staging is performed from the first archive copy, so having that copy on an online disk allows much faster staging. The disks used for archive copy 1 do not need to be of particularly high quality as they will only hold the sole backup copy until the second archive copy is made to removable media. 

 

In most installations the second archive set copy goes to removable media, where it will usually remain as an online backup of the file. The third copy goes to a different piece of removable media, usually for eventual relocation to offsite storage. If an additional online backup or offsite backup copy is desired, the fourth copy can be used for that purpose. Placing copies of files on different media allows you to diversify your backups onto different media and into different locations, so that files are secure.

 

VSN associations also allow you to send media to the most appropriate library if you have more than one - large files or files in frequently written directories can be sent to a the fastest library with the highest capacity tapes to help avoid archiving backlogs.

 

Archive Set Design Guidelines *PERFORMANCE ISSUE*

 

It is possible to set up as many archive sets as you want in the archiver.cmd file. It is not a good idea to have too many archive sets, however, because that can seriously affect the performance of the system. As a rule of thumb, the number of active archive sets should never exceed the number of drives in the online library. So if you have four drives in the library, you should limit your SAM-FS or SAM-QFS system to four active archive sets at most. Archiving can be significantly slowed by the need to unload a drive, reload it with the correct VSN and position tape. The more archive sets you have, the more archive requests will be created, and the more likely it is that the drives will be loading and positioning media rather than writing or reading. Reducing the number of archive sets also increases the probability that the system can leave media in the drive for repeated use by the same archive set, which decreases usage of the drives. The drives are then more likely to be available for archiving or staging which is a major goal of configuring archiving. You do not need to limit the number of archive sets that contain files that are almost never changed or that are never archived. They have no significant effect on drive performance.

 

Directory Archiving and Recovery - *PERFORMANCE ISSUE*

 

In the case that a disk is lost, SAM-FS and SAM-QFS directories may be restored to the disk using a dump file produced by the samfsdump command, which backs up all metadata including directories. The samfsdump command is the ideal way to back up SAM-FS or SAM-QFS directories. Directories may also be restored from archive copies of the default archive set, using the request and star commands. Once the directories are restored, the request and star commands may then be used to restore files into the directory structure.

 

Restoring by using request and star is extremely slow, and so labor intensive that it may be practically impossible to restore a large file system this way. If it is at all possible, you should run regular backups of your archiving file systems using samfsdump.

 

Every time you add a file to or delete a file from a directory, that directory is modified, and must be archived. The default archive set is therefore very active and will have to be counted against the total number of active archive sets you can configure on your system. Archiving directories can therefore reduce your options and waste a considerable amount of processing and media resources. In addition if a tape contains a stale archive copy (one for a file that has been modified but not yet archived again), that tape cannot be recycled. The archives of directories are very frequently stale because of their activity, and so archiving directories greatly increases the complexity of recycling.

 

If you are using samfsdump to back up your file system metadata there is no reason to archive directories and good reason to avoid archiving them. Prevent directory archiving by:

 

1. Making sure all regular files are included in a configured archive set, usually by configuring a “catch-all” archive set. You will then not need the default archive set as a catch-all for files not otherwise assigned to an archive set.

2. Globally setting the variable archivemeta=off in the archiver.cmd file.

 

If archivemeta=off is set, you need not declare a VSN association for your default archive set.

 

In Release 4.0 of SAM-FS, the archivemeta parameter is not available. To prevent directory archiving in Release 4.0, set the archive age for copy one of the default archive set to a very large value such as 10 years. Do not configure any additional copies. This will have the same effect as setting the archivemeta parameter to off. Then declare a VSN association for a non-existent VSN, such as "NOTAPE" or "FAKE". This will satisfy the requirement that each archive set copy have a VSN association without tying up an actual piece of media.

 

The Timing of Archiving

The Archive Age

 

The user or process writing to a file may repeatedly modify the file over a period of time. Archiving such a file every time it is written to disk would result in many interim archive copies being made, which would use large amounts of space on media. SAM provides a variable called the archive age that is the minimum elapsed time between creation or modification of a file and the creation of a particular archive copy of that file. You can configure up to four archive copies of files in each archive set so each file may have as many as four archive ages associated with it—one for each archive copy. If you know the pattern of use of files on your file system, you can set the archive age on archive copies to a value that prevents excessive archiving, but that also provides adequate backup.

 

By default, the archive age of an archive set copy is four minutes. Therefore, modifications to a file that will be archived with the defaults must be at least four minutes old before an archive copy of that file can be made. That does not mean that the file is archived when those modifications are exactly four minutes old. They may be much older, but they cannot be less than four minutes old.

 

*PERFORMANCE ISSUE* Because newly created files are often quite active, four minutes is likely to be much too short an archive age. Excessively short archive ages lead to archiving partially completed files repeatedly, which can cause a bottleneck at the drives and which uses up tape rapidly. If an incomplete file must be backed up shortly after creation for security reasons, set the archive age on the first copy to an age when the file will probably have significant data, and archive that first archive copy to an online disk. The second copy can then be archived to removable media after modifications to the file are completed. Otherwise the first and second copies should be made at the same archive age, when the file is complete. The third archive age, for copies destined for offsite storage, should be set to a day or more, depending on how often you ship media to offsite storage.

 

To keep the online disk from being overrun with unused files, recycle it frequently, and set archive copies sent to online disk to unarchive if they have not been accessed recently

(details covered later).

 

The Archive Interval

 

If each file was sent to tape as soon as it reached its archive age, the drives would be overwhelmed with loads and unloads of tapes for different archive set copies. To keep the use of the drives at a minimum, files needing archiving are accumulated prior to being sent to tape in a few large writes. The period of time during which files are accumulated is called the archive interval. Long archive intervals mean more efficient use of the drives, but longer times between creating and archiving a file. Long archive intervals improve performance but hurt availability.

 

The archive interval can only be set globally or on an entire file system, which is not very granular. As of Release 4.4 of the SAM-QFS software, it is also possible to accumulate a specific number of files (-startcount) or a total amount of data (-startsize) before sending files to tape, completely ignoring the archive interval. These parameters are set per archive set copy rather than per file system. It is also possible to override the archive interval directly (-startage) for an archive set copy.

 

Release 4.1 and later of SAM-FS/QFS implement a default mode of archiver function called continuous archiving, although other modes of archiving are available (see Appendix A for description of scan mode archiving). In continuous archiving, a dedicated instance of a sam-arfind daemon is started when a file system is mounted. That sam-arfind daemon continuously monitors its file system. The file system informs the sam-arfind daemon when a file in its file system has been created or modified, and will therefore require archiving. The sam-arfind daemon then records the directory containing the modified file in the scanlist, along with the time when the newly modified file will reach the earliest archive age associated with one of its archive copies, and the numbers of the archive copies that will need to be made. The scanlist can be viewed with the command showqueue -v.

 

When the time specified in the scanlist is reached for a directory, sam-arfind scans that directory, and opens an archive request for the archive set copy whose archive age has been reached. An archive request is a list of all files that will be archived together for a particular archive set copy. Each archive request contains files in

 

 

Any other files in the directory that require archiving are added to their respective archive set copies' archive requests, and the scan time is updated so it reflects the earliest time at which any archive set copy of a file modified and still not archived will reach its archive age.  With continuous archiving, if no files in a directory require archiving, or if a directory is marked for "no archiving" that directory is not scanned, greatly increasing the efficiency of archiving. Archive requests are also viewed with showqueue -v.

 

Once an archive request has been created, sam-arfind continues to add files to the archive request as they reach their archive ages for the duration of the archive interval. At the conclusion of the archive interval sam-arfind steps through the completed archive request composing tar files. The sam-arfind daemon then passes the lists of tar files to sam-archiverd, which starts an instance of sam-arcopy to write the tar files. In continuous archiving, an archive copy of a file may be created as soon as the file reaches the archive age, or as late as the archive age plus the archive interval (see Appendix A for an example).

 

Individual archive set copies can also be configured so that the archive request is sent to media when a specified number of files have been added to the archive request (using the

-startcount value parameter), when the files in the archive request reach a specified total size (using the -startsize value parameter) or when the first file in the archive request has been there for a specified period of time (using the -startage value parameter). This last value is used to override the archive interval for the archive request for a specific archive set copy, but otherwise acts exactly like an archive interval. If any of these three values are set, the archive interval is ignored. 

 

In Release 4.5 and later of the software the value of the archive interval is automatically set to 10 minutes and can only be reset by specifying the -startcount, -startage or -startsize parameters. This behavior is not well documented.

 

All archive requests for a file system can be removed with the command

samcmd arrmarchreq <filesystem>.*, e.g. samcmd arrmarchreq samfs1.*, or samcmd arrmarchreq.<filesystem>.<archive_request_name> , e.g. samcmd arrmarchreq.samfs1.samfs1.1.1  for a single archive request. It is also possible to actually delete an archive request manually. Sun Support would prefer you not do this, but it can sometimes be the only way to get rid of an archive request stuck in the archiver queue. Go to /var/opt/SUNWsamfs/archiver/<file_system>/ArchReq. It contains the archive requests, which are binary files read by showqueue. Delete the request you want removed using the rm command.

 

If a file is modified before the archive request closes, the original entry in the archive request for that file will be removed, and the process of archiving will begin again with the countdown to the archive age.

 

Scans

 

During normal operations, the sam-arfind daemon performs partial directory scans. In a partial directory scan directories containing modified files – and only those directories - are scanned for files requiring archiving. If the daemon is stopped and restarted, it has no information about the file system and no idea where modified files might be located. In that case it must perform a full directory scan.  In a full directory scan, every directory in the file system is scanned from the mount point down. It identifies the inode number of each file or directory in the file system from the contents of directory data blocks, locates the inode in the .inodes file, and determines if the file requires archiving. Any time a file system is mounted, or the archiving daemons receive a HUP signal, an archiver run with a full directory scan is performed. Directory scans of extremely large file systems may take a very long time to complete. It is therefore important to avoid restarts of the archiving daemons and file system unmounts for large file systems, as all these actions will trigger a full directory scan. For this reason you should not “HUP” the SAM daemons, and you should avoid unmounting SAM-QFS file systems unless it is absolutely necessary.

 

Unarchiving

 

Online disk archives are used to provide immediate backup of files, prior to archiving to removable media, and to provide quick staging of files. Once you have one or more copies of a file on removable media, you can dispose of its online disk archive copy by unarchiving it. When an archive copy of a file is unarchived, that archive copy is marked unarchived in its inode and the file system releases its control of that archive copy. The file system then no longer recognizes the existence of that archive copy. The space used by the archive copy in the online disk archive can then be reclaimed through the process of disk recycling, discussed later. 

 

Unarchiving is set on individual archive set copies. In the archiver.cmd file, each archive set copy declaration will normally be followed by an archive age. A second time value, the unarchive age, may then follow the archive age, and will indicate the length of time the archive copy should be maintained after the original file (not the tar file containing the archive copy) is last accessed (Scripts employing the “unarchive” command may not be substituted for configured unarchiving, as the command does not work the same way).  Once the original file has not been accessed for the duration of the unarchive age, the inode is modified and the archive set copy value replaced with a notation of the unarchiving action, so that the archive copy will not be remade. The archdone flag is removed. If the file is modified thereafter, the unarchived copy will be remade and the process of counting the length of time to unarchiving will begin again.  Unarchiving does not reclaim disk space – only recycling does that. Unarchiving must therefore be combined with regular recycling of the online disk. An unarchived archive copy will not be remade if the “archive” command is issued or for any other reason than that the file is modified.

 

Drive Performance - Performance Issue Summary

 

If there are too many files or too many archive sets for the number of drives, backlogs in archiving can slow down archiving significantly.  If you have backlogs (use showqueue

-v to view them) you must increase the number of drives, decrease the number of archive copies you make to removable media, or improve the efficiency of your archiving. Otherwise, you are not adequately backing up your files, and your entire system will slow down. You can improve the efficiency of your archiving by:

1) Increasing archive ages of archive copies sent to removable media

2) Increasing archive intervals (or using -startcount or -startsize to increase the size of archive requests)

3) Decreasing the number of archive sets. In particular, avoid archiving the default archive set

4) Decreasing the number of archive set copies for each archive set ONLY if you have excess

5) Archiving the first archive copy to an online disk. Archiving the first archive copy to an online disk so greatly decreases the security risks accompanying drive bottlenecks that Sun considers it a Best Practice.

 

Archiver Log Files

 

By default, no logs of archiver activity are kept. Archiving logs can be configured for each file system or for all file systems  in the archiver.cmd file. These logs are extremely valuable, because they specify the location of every archive copy made. That information can allow you to recover an archive copy of a file from media using the request and star commands even when you have no other backup of the file system. Without the archiver log files, locating individual files is extremely difficult. One log file in /var/opt/SUNWsamfs or another location in a UFS file system should be maintained for each SAM-managed file system and backed up regularly.

 

A record in this file contains fourteen fields describing an archive copy made of a file. An example entry is listed below the table.

 

Archiver log file output fields

 

Field

Entry

1

Archive activity, as follows:

• A for archived

• R for rearchived

• U for unarchived

2

Date of archive action in yyyy/mm/dd format

3

Time of archive action in hh:mm:ss format

4

Equipment Type identifier of archive media. For information on media types, see the mcf(4) man page

5

VSN to which archive copy was sent

6

Archive set and copy number, separated by a period

7

Physical position of start of archive tar file on media, a period, then the file offset within the archive file, in hexadecimal. The physical position of the start of the archive tar file is used with the request command to recover files from tape

8

Name of file system on which the file resides

9

Inode number of the file

10

Length of the file

11

Path and name of file relative to the file system’s mount point

12

Type of file, as follows:

• d for directory

• f for regular file

• l for symbolic link

13

Segment number. If an extremely large file must be broken up into segments and spread across multiple tar files, the number of the segment is listed here. Otherwise this value is zero

14

Equipment Ordinal of the drive on which the file was archived, as defined in the mcf file

 

A 2003/11/22 18:29:29 lt 100003 all.1 9.1 samfs1 1088.1 9292 ucb/bg f 0 100

 

In the example given above, the regular file (f) ucb/bg under the file system mount point for the file system samfs1 was archived on November 22, 2003 at 18:29:29. It was sent to a DLT (lt) tape with the VSN 100003, using the drive with the Equipment Ordinal of 100. It was archived with copy 1 of the archive set all (all.1). The hexadecimal position on the tape of the tar file containing this archive copy is 0x9, and the archive copy is the first file in the tar file (9.1). The file’s inode number is 1088.

 

Summary of Archiving Defaults

 

Defaults for archiving parameters discussed in this module are contained in the table below.

 

Term

Definition

archive interval

In scan mode archiving, the time period between the end of an archiving run by the sam-arfind daemon, and the beginning of the next run. In continous archiving, the time period between the creation of an archive request and the time files in the request are sent to media. Default: 10 minutes

archive age

The minimum elapsed time between creation or modification of a file and creation of an archive copy of that file. Each archive copy of an archive set is associated with an archive age. Default: 4 minutes

VSN association

The assignment of archive copies to a particular tape, disk, or set of media. Default: any available media.

archive set

A grouping of files with common archiving properties. They are written to media together as one or more tar files. Default: the archive set named for the file system.

archive copy

A copy of the files in an archive set. Default: one archive copy is made of every file or directory in an archive set.

archiver log

A log file containing a record of archive copy activity. The archiver log can be used to recover files from archive copy even in the absence of a backup of file system metadata. Default: no log

 

 

Appendix A

 

In Release 4.0 of SAM-FS/QFS, the archiver only functions in scan mode.  In this mode each sam-arfind daemon sleeps for the archive interval, then wakes and performs an archiving run. In an archiving run, the archiver reads the .inodes file and identifies files that that have met or exceeded the archive age for one or more of the archive copies configured on the archive set to which they belong. These files comprise the archive request. As soon as the run is complete, the archive request closes and its files will be archived as soon as possible thereafter. By default the archive interval is 10 minutes, but it may be set to any other time value in the archiver.cmd file. This mode of archiving function can be used in releases 4.1 and later of the software by setting the parameter examine=scan in the archiver.cmd file. The default mode is examine=noscan. Occasionally the scan method of archiving may provide better performance, particularly if you have a very deep directory structure, or if you are using SAM to archive files created by SAMBA. Otherwise the continuous (noscan) method of archiving is almost always preferable.

 

In the scan mode of archiving you do not have precise control over the timing of archiving for specific files because waiting for a specified period between archiver runs sacrifices control over the exact time of archiving.. An archive copy of a file may be archived as soon as it reaches its archive age, but only if that coincides with a run of the archiver. In contrast, in release 4.1 and later, the continuous archiving mode allows overrides of the archive interval, so that an archive copy can be made using other criteria than the time since the last archiver run, using the

-startcount,  -startage and -startsize parameters.

 

In scan mode, the archiver runs whether or not there are files to be archived. An archiver run has an effect on performance, and for very large file systems may take hours. If few files have been created or modified in that file system, the processing capacity used to perform the archiver run is wasted.

 

Scan mode: After a file has reached its archive age, it is eligible for archiving, and is added to an archive request after next sam-arfind daemon run. In scan mode archiving, that occurs when the archive interval for that daemon has elapsed. In scan mode, it will subsequently be archived as soon as possible after the archiver run. A file may therefore be archived as soon as its archive age allows, if the archiver daemon runs right after the file reaches the earliest archive age associated with its archive copies. If the last sam-arfind run occurred just before the archive age was reached, the file may not be archived for a considerable time. If the archive interval is 30 minutes and the archive age of an archive copy of a file is 15 minutes, that copy may therefore be made anywhere from 15 to 45 minutes after the file is created or modified. The following example shows how an archive copy of a file with an archive age of 15 minutes might be created on a system with an archive interval of 30 minutes:

 

9:50:00 – The sam-arfind daemon completes an archiving run.

10:05:00 – File is created.

10:20:00 – File reaches its archive age and becomes eligible for archiving.

10:20:01– Archiver runs and identifies the file as a candidate for archiving. The file is subsequently archived.

 

In the preceding example the file is archived almost as soon as it reaches its archive age, because that event coincides with a run of the archiving daemon. Another possibility:

 

9:49:00 – The sam-arfind daemon completes an archiving run.

10:05:00 – File is created.

10:19:00 – Archiver runs.

10:20:00 – File reaches its archive age and becomes eligible for archiving.

10:49:00 – Archiver runs and identifies the file as a candidate for archiving. The file is subsequently archived.

 

This example uses the same archive age and the same archive interval as the previous example, but it takes three times as long for the file to archive. This interaction between the archive interval and the archive age —and the variability it introduces into archive timing—must be considered when you configure archiving.

 

On very large, very active file systems, the assumption made above - that the archiving run itself takes negligible time - may not be correct. As your file systems grow, archiving runs will take increasing amounts of time. File systems with millions of files can take days to completely scan.

 

Once an archiving run is finished, the sam-arfind daemon passes lists of files to be archived to sam-archiverd, then sleeps for the archive interval. The sam-archiverd daemon starts an instance of sam-arcopy, to which it passes the lists of files to write. The length of time required to actually copy files to tape has no effect on the archive interval or on the timing discussed above. However, if the library must load and position tapes repeatedly in order to performing archiving, delays in actual archiving can be introduced into this process.

 

Continuous mode: In continuous archiving, the archiver creates an archive request with the first file to reach its archive age, then waits for the archive interval to elapse, all the while adding files to the archive request.

 

A file may therefore be archived as soon as its archive age allows, if the archive interval elapses right after it is added to an archive request. If it is the first file added to an archive request, the file may not be archived for the duration of the archive interval plus its archive age (unless -startage, -startsize or -startcount is set). If the archive interval is 30 minutes and the archive age of an archive copy of a file is 15 minutes, that copy may therefore be made anywhere from 15 to 45 minutes after the file is created or modified, exactly as it was in scan mode archiving. The following example shows how an archive copy of a file with an archive age of 15 minutes might be created on a system with an archive interval of 30 minutes:

 

9:50:00 – Archiver runs and creates an archive request

10:05:00 – File is created.

10:20:00 – File reaches its archive age and is added to the archive request

10:20:01– The archive interval elapses and the file is archived.

 

In the preceding example the file is archived almost as soon as it reaches its archive age, because that event coincides with the end of the archive interval. Another possibility:

 

9:49:00 – Archiver runs and creates an archive request

10:05:00 – File is created.

10:19:00 – The archive interval elapses

10:20:00 – File reaches its archive age and is added to a newly created archive request.

10:50:00 – The archive interval elapses and the file is archived.

 

In this example the file was archived 30 minutes later than in the preceding example, as it was in scan mode archiving.

 

Archiving Daemons

 

Archiving is controlled by three daemons:

- The sam-archiverd daemon – This master archiving daemon reads the archiver.cmd file when it is started or when the command samd config is issued. It passes archiving parameters to the other archiving daemons and passes information between archiving daemons. The daemon sam-archiverd runs continuously. If it is killed the sam-fsd daemon starts it again automatically. The sam-archiverd daemon also updates the inode with archiving status information including the archdone flag, set when all archive copies of a file have been completed. This flag is displayed in the output of  sls -D.

 

-The sam-arfind daemons – These daemons are often referred to as the archivers. One instance of this daemon is started by sam-archiverd for each file system when that file system is mounted, unless the file system is mounted with the nosam or noarchive (in some releases) mount options. It monitors file system activity and creates lists of files to be archived, called archive requests. It also composes tar files, which consist of a list of the files in an archive request that will be sent to tape in a single write operation. Once the archive request is written, the inode is updated with the date and time when the file finished archiving (credit Rodney Lindner-SunAustralia who figured this out).

 

-The sam-arcopy daemon is started by sam-archiverd when sam-arfind finishes composing the lists of files to be sent to tape. It copies files to tape or other media using the star utility. It then confirms to sam-archiverd that copies were made and exits.

 

Hosted by www.Geocities.ws

1