ZFS - The Zettabyte File
System
http://www.opensolaris.org/os/community/zfs/demos/basics - An excellent demo on creating ZFS file systems
The Zettabyte file system (ZFS)
ZFS file systems are placed on zpools, which are storage volumes composed of partitions or entire disks. A zpool can be formed of a single disk or partition, it can be a simple (non-RAID) stripe, a mirror (RAID-0) or a variation on RAID-5 called RAIDZ. RAIDZ is a distributed parity stripe, but the size of the chunks written to each disk are variable between 512 bytes and 128k such that every stripe is a full stripe write. RAIDZ never performs partial-stripe writes and therefore eliminates read-modify-write operations. . If you have 24k to write, 24k are striped across the disks in the ZFS file system, and final parity is written. If you have 512k to write, 512k are striped and final parity is written.
The size of the ZFS file system is limited only by the size of the zpool, which may be grown. Normally multiple ZFS file systems are place on a zpool in a hierarchical structure beneath the original file system; they will automatically grow and shrink as needed.
Working with ZFS
Creating zpools and file systems
The ZFS file system allows the superuser to create a managed pool with a top-level file system and mount it in one action. A standalone disk can become a file system with the following command which 1. Formats the disk c1t1d0 2. Creates a zpool called fs10 containing that disk 3. Places a file system called fs10 on it 4. Creates the mountpoint /fs10 and 5. Mounts the file system fs10 on the mount point:
# zpool
create fs10 c1t1d0
Or on a partition:
# zpool
create fs10 c1t1d0s0
If multiple partitions or disks are specified without a keyword specifying the storage configuration, data is striped to all disks without parity (RAID-0).
# zpool
create fs10 c1t1d0s0 c1t1d0s1
If another file system has previously been present on the device, the -f option to zpool must be used to force an overwrite of that file system, for example
# zpool
create -f fs10 c1t1d0
ZFS places an EFI label on the disk in which all space except 8 MB is placed in partition 0, and 8 MB are placed in partition 8 for use as a private area. If partitions are used to create the zpool, they must be at least 64 MB.
The EFI label may be removed from the disk using the command format -e, then choosing “SMI” when asked for the type of label.
This single command creates a zpool named fs10 with one disk, c1t1d0, and places the file system fs10 on it. The name of the pool and of subsidiary file systems is chosen by the administrator, but the name of the pool, of the top level file system and of the default mount point will all be the same. It creates the mount point /fs10 and mounts the file system. This first file system is the base file system of the pool. Other file systems may be placed on the pool, but they will always be subsidiary to fs10 and mounted on mount points within fs10:
# zfs create fs10/fs100
Entire hierarchies of file systems may be created this way. Each will have access to all space in the devices of the file system, although quotas may be specified for each file system, which they cannot exceed.
All pools may be listed:
# zpool
list
Or file systems can also be listed:
# zfs list
Their status also appears in the output of df -h.
For information on the management and components of a pool:
# zpool
status -v fs10
A ZFS filesystem may be unmounted:
# zfs unmount fs10/fs100
and remounted:
# zfs mount fs10/fs100
Removing a subsidiary ZFS file system is done like this:
# zfs destroy fs10/fs100
A pool and all its file systems can be destroyed like this:
# zpool
destroy fs10
This operation destroys the pool, removes ALL file systems and removes the top-level mount point.
A mirror can be created using
# zpool
create fs10 mirror c1t1d0s0 c1t1d0s1
Submirrors can be of different sizes, however, if they are, space is wasted on the larger submirror. The zpool command must be forced (-f) to accept mirrors with unequal submirrors. The resulting mirror will be the same size as the smallest mirror.
RAIDZ devices are creating using the raidz keyword:
# zpool
create fs10 raidz c1t1d0s0 c1t1d0s1
A RAID-1+0 array is created by combining stripes and mirrors:
# zpool
create fs10 mirror c1t1d0 c2t1d0 mirror c1t2d0 c2t2d0
Setting Properties
You can view and set properties on a ZFS file system:
# zfs get all fs10
Once you have created a ZFS file system, it will automatically be mounted at boot without entries in the vfstab. You can cause a ZFS file system to be unmounted and prevent it from being mounted thereafter by setting the mountpoint property of the file system to “none.”
# zfs set mountpoint=none fs10
unmounts the file system and prevents it from being mounted at all. Setting mountpoint=legacy allows the file system to be managed in the traditional way in the file /etc/vfstab. Unless mountpoint=legacy, you cannot mount and unmount zfs file systems using the mount and umount commands. You must use the commands:
# zfs unmount fs10
and
# zfs mount fs10
This property can also be set to mountpoints other than the default.
# zfs set mountpoint=/export/fs10 fs10
immediately unmounts the file system and remounts it at the specified mountpoint. Like other properties, the designation of the new mountpoint will be persistent across reboot.
# zfs set sharenfs=on fs10
This property immediately NFS shares the ZFS file system and persists across reboots. No entry in the dfstab is required.
Another useful property is compression, which may be set “on” or the default “off.” All data subsequently added to the file system will be compressed; current files will not be compressed.
# zfs set compression=on fs10
Quotas are set as a property:
# zfs set quota=50m fs10/fs100
Once a quota has been set, the file system size is limited to the size of the quota. Check the file system size with:
# df -h
Inheritance of
properties
When a property on a ZFS file system is set by the system administrator, the property's source is changed to "local" from the original value of "default". The value of the source can be seen in the last line of the output of
# zfs get quota fs10/fs100
A property set on a higher-level directory will be inherited by all lower-level directories. The source for such properties will show up as "inherited from…" the directory it came from.
The inherit subcommand can also be used to change the value of the property in a lower-level file system back to that of the parent. The following series of commands resets the quota on fs10/fs100 to the default:
# zfs inherit quota fs10/fs100
# zfs get quota fs10/fs100
If the property to be inherited is the default, the output of zfs get all will show "default" in the last field, as it is in the example above. If the property has been set on the file system from which it is inherited, that is, the inherited property is a local property, the source will be specified as being inherited from the parent directory.
To clear properties in the top level file system:
# zfs inherit quota fs10
# zfs get quota fs10
Snapshots
ZFS has a feature called "snapshots" which allow you to create an instantaneous backup of a file system, and also allow a form of versioning. Create a snapshot with:
# zfs snapshot fs10/fs100@24jun
The snapshot will be called 24jun, and will be located in /fs10/fs100/.zfs/snapshot (assuming /fs10/fs100 is the mountpoint for the file system.)
A snapshot consists of a read-only backing store in the zpool, which contains pointers to files in a file system. The files in the snapshot are set at the time the snapshot is taken. If a file present at the time of the snapshot is removed from the original file system, a copy of the file is added to the backing store of the snapshot. The use of space by the snapshot is visible in the output of:
# zfs list -t snapshot
More space is used after a file is removed from the original file system.
A snapshot is dependent on its file system and is always attached to it. It cannot have properties set on it.
A file system can be rolled back to an earlier snapshot with the command:
# zfs rollback fs10/fs100@24jun
When a file system is rolled back, any files added to the file system after the snapshot was created will be lost. Any files deleted from the system after the shapshot was created will be returned to the file system.
A clone file system mounted at /fs10/fs200 can be created from the snapshot with
# zfs clone fs10/fs100@24jun /fs10/fs200
The new file system can have properties set on it, is readable and writeable, and files can be deleted from it and added to it. It uses the same backing store as the snapshot, so it takes no duplicate space. Because some of the files in the clone are dependent on the pointers in the snapshot, all clones must be deleted before a snapshot can be deleted. The clone allows you to have multiple, parallel file systems all based on the original file system.
Clones may be promoted, in which case the clone ceases to be dependent on the snapshot and becomes a complete file system with the associated use of disk space. When the clone of a snapshot is promoted the snapshot is detached from the original file system and attached to the clone as a snapshot. Files in the snapshot are copied to the former clone's new file system, and the snapshot moves into .zfs/snapshot in the new file system. The snapshot can then be deleted without affecting the clone.
# zfs destroy fs10/fs300@24jun