A Samba VFS Tutorial


This is a tutorial on writing simple "virtual filesystems", using Tim Potter's Samba VFS-Switch. As an example we'll add a commonly asked-for feature to regular Samba filesystems: our new filesystem will log who has which files open. This is a useful feature for administrators, and is a big enough project to demonstrate the use of a VFS.

1. Introduction

1.1. What's a VFS?

A virtual filesystem is one with some addition abilities added to an existing filesystem, like cachefs, or a whole new filesystem, like Linux ext2fs.

In Samba's case, a VFS is normally one with extended capabilities, implemented on top of the local Unix filesystem.

1.2. Why Might You Want One?

Well, you might want to add something to Samba that your underlying UNIX filesystem lacks. For example, you might want a Mac-like "trashcan" where deleted files go, to give you a chance to recover for accidental deletions. Or, you might want to translate between Unix and DOS line-ending conventions on recognized kinds of text file. Or you might just want to put quota support in, but in a better way than filling the Samba sources with "#ifdef QUOTA".

In general, using a VFS is an elegant way to add features to a filesystem, without having to know everything about the filesystem implementation.

It's highly extensible: you can do all sorts of things with it, not just the examples listed above. If you don't mind the performance hit, you could even arrange to run a different script on every file operation. Andy Bakun ([email protected]) has already done something similar, with an patch to implement "on close" processing option, so he could automatically rasterize any postscript files placed in a particular share.

Other people's VFSs

The first VFS switch was written by Sun to give them a place to attach what was at the time a new idea: NFS.

It's used heavily on Linux, to allow for different kinds of regular filesystems as well as special filesystems like /proc, and on Solaris to provide cachefs (local caching of NFS-mounted files).

While it was intended to allow entire filesystems to be attached to a virtual-operation layer (like a virtual-memory layer), it was found handy for "layering" or "interposing" additional functionality. This is the sort of VFS that we're going to discuss. Writing a new filesystem for Samba is a subject of quite a different paper!

A short list of historical and Linux references is included as Appendix A.

2. How to Design A VFS for Samba

The hard part's already been done, by Tim Potter ([email protected]). He built a VFS switch into Samba proper. What you need to do is figure out you'd want to add, and the decide how put it into each of the standard filesystem calls.

2.1. What's There?

The VFS switch provides access to all of the filesystem calls listed in Table 1:

Table 1: Operations in the Samba VFS.
Virtual-Disk Operations
connect Connect to a share int skel_connect(struct vfs_connection_struct *conn, char *svc, char *user)
disconnectDisconnect from onevoid skel_disconnect(void)
disk_free_spaceReport free spaceSMB_BIG_UINT skel_disk_free(char *path, SMB_BIG_UINT *bsize, SMB_BIG_UINT *dfree, SMB_BIG_UINT *dsize)
Directory Operations
opendirDIR *skel_opendir(char *fname)
readdir struct dirent *skel_readdir(DIR *dirp)
mkdirint skel_mkdir(char *path, mode_t mode)
rmdirint skel_rmdir(char *path)
closedir int skel_closedir(DIR *dir)
File Operations
openint skel_open(char *fname, int flags, mode_t mode)
closeint skel_close(int fd)
read ssize_t skel_read(int fd, char *data, size_t n)
write ssize_t skel_write(int fd, char *data, size_t n)
lseek SMB_OFF_T skel_lseek(int filedes, SMB_OFF_T offset, int whence)
renameint skel_rename(char *old, char *new)
sync void skel_sync(int fd)
stat int skel_stat(char *fname, SMB_STRUCT_STAT *sbuf)
fstat int skel_fstat(int fd, SMB_STRUCT_STAT *sbuf)
lstat int skel_lstat(char *path, SMB_STRUCT_STAT *sbuf)
fcntl_lock BOOL skel_lock(int fd, int op, SMB_OFF_T offset, SMB_OFF_T count, int type)
unlinkint skel_unlink(char *path)
chmod int skel_chmod(char *path, mode_t mode)
utime int skel_utime(char *path, struct utimbuf *times)

To add functionality to "open", for example, you need to write a function for the switch to call, which does something extra and then calls the underlying unix filesystem open command.

And example of a do-nothing "intercepter" for write looks like Figure 1:

Figure 1: Write Interceptor.
ssize_t skel_write(int fd, char *data, size_t n)
{
    return underlying.write(fd, data, n);
}
See Appendix B for the complete do-nothing intercepter.

This function is called by samba whenever it wants to write to the Unix filesystem. The function just passes the call on to the underlying library function, whose address is in the "underlying" table. If this is the only interceptor, underlying.write() is the address of the standard write command.

The indirect call is really to the "next lower" VFS: this is to allow you to stack VFSs on top of each other.

2.2. Select a Task

What's a feature you wished Samba had? If it can be done as part of a filesystem, chances are it can be done in a VFS.

Let's assume you want a way to find out what files a particular user has open, or what users have open files. Perhaps you're about to reboot, and want to warn them to save their work. This is definitely filesystem-related, so you can use the VFS switch in implementing it.

Let's also say we're lazy, so we'd prefer to just log opens and closes for an awk or perl script. The script can use the list of users reported by smbstatus and then look at logs to see what files, if any, they still have open.

Validate the Task

That sounds reasonable, so let's see if it makes sense. It would be silly to start a project without being sure we can finish it:

Is the data we'd need available to us? Smbstatus reports user name and the pid of the samba subprocess: and the open call knows the filename, the uid and it's own pid, so that part can be logged. But close doesn't know the filename, only the fd. So we'll need to log the fd, and let the awk script look to see if usr 56 had closed file 3 yet.

Can we do the "list of open files" (lsof) processing easily enough in a script? Yes, as using grep and sort, then discarding files that have been closed.

Finally, is it easy and fast? Sure looks like it!

2.4. Design an Object

Virtual filesystems are objects, with a single start operation (open), a single stop (close), and a bunch of operations on open files (read, write, seek). As they're old object, they also have some odd and wart operations (creat, lseek).

To make a VFS, you design an object that "subclasses" the filesystem and adds the features you want.

If we're logging via the syslog, this as particularly easy: as syslog is an object with similar structure: one entrance, one exit, and a few operations in the middle. We chose syslog in part because of its similar structure.

Our object is going to look like Table 2:

Table 2: Intercepted Operations for LSOF VFS.
Name Function Notes
connectinitialize syslogoptionally set uid and pid
create
open
log filename, uid, pid, fd, "open"  
closelog uid, fd, "close"uid and fd allow us to match records

The only hard part is making sure we've initialized syslog the first time we write to it. We'll use a flag to make sure we initialize it just the once.

2.5. Sanity-Check It

Before we actually start implementing, we do need to ask ourselves a few questions, you make sure we've not missed something obvious.

  1. What's missing?

    Looking at the list of operations in Table 1, one jumps out at us: what if the user renames a file with "Save As"? from a PC program? Do we care?

    In this case, no: we still have the original file open, and now have opened a new one. When the program exits, it'll close both. If we were tracking use of files for security, we might care"for "lsof" we don't.

  2. What do we do on error?

    Do we fail the open if we can't log? No: the user would be justified in throwing rocks at us if logging problems prevented their saving their work!

  3. Could we ever hang, or delay the user?

    Not in this case: syslog writes always return as soon as they can, even before they know the data has been saved. We'd need to worry more about that id we were using logs directly, though. (Again, we've chosen syslog in part because of this error-tolerant behavior). We are going to add some overhead, but only the equivalent of one fprintf.

3. Implementing our VFS

This is the easiest part: we'll take a template VFS and hack in the new functionality.

3.1. Load the Skeleton

The skeleton is illustrated in figures 2 and 3 below, and lives in the file <vfsdir>/skel.c

Figure 2: Open Interceptor
int skel_open(char *fname, int flags, mode_t mode)
{
    return underlying.open(fname, flags, mode);
}
Figure 3: Setup and Teardown
/*
 * Init -- save previous vfs operations, return new vector
 *      for Samba to use.
 */
struct vfs_ops *vfs_init(struct vfs_ops *former)
{
    openlog("smbd_audit", LOG_PID, SYSLOG_FACILITY);
    memcpy(&underlying, former, sizeof(underlying);
    return(&audit_ops);
}

/*
 * Teardown -- return the saved operations vector
 */
struct vfs_ops *vfs_teardown(void)
{
        return &underlying;
}

The interceptors are all variations on the ones we saw before: the startup and teardown is the mechanism which allows us to "stack" (subclass) VFSs.

Initially underlying.write() is write(): we save write to underlying.write and replace vfs->write with the call to lsof_write(). Lsof_write calls underlying.write, the former value of vfs->write. Teardown is just the inverse.

To try out the dummy/template interceptors:

  1. copy skel.c to lsof.c

  2. rename all the functions from "skel_" to "lsof_"

  3. change the Makefile to suit

  4. Add the path to the Samba source tree to the Makefile. For example, Tim's is SAMBA_SRC=/home/tpot/work/samba/source.

  5. Run "make"

  6. If you get errors like "cannot find include file: config.h", you need to run configure in the Samba source tree first.

  7. add 'vfs object = /full/path/to/lsof" in your smb.conf (as in figure 4)

Figure 4: VFS Test Share.
[tesvfs] 
	vfs object = /home/work/testvfs/lsof.so
 	path = /var/tmp

3.2 Make the Minimal Tests

You'll need a few scripts to see if the vfs actually works: we use ones like figure 5:
Figure 5: Trivial Test Script
smbclient //server/testvfs -s "ls"

3.3 Add What You Need

Now we add the syslog code, as in figure 6:
Figure 6: Open Interceptor.
lsof_open(char *fname, int flags, mode_t mode) { 
 	if ((fd=underlying.open(fname, flags, mode)) != ERR) {
		syslog(LOG_INFO, "opened %s by %s as fd %d", 
		fname, getuid(), fd); 
	}
	return fd; 
}

The initialization code is just

Figure 7: Syslog Initialization.

/* VFS initialization function.  Return initialized vfs_ops structure
   back to SAMBA. */

struct vfs_ops *vfs_init(void)
{
    openlog("smbd_audit", LOG_PID, SYSLOG_FACILITY);
    return(&lsof_ops);
}

A word of caution: you can't just use fprintf(stderr,...) in the test vfs for debugging: as smbd is a daemon , stderr goes to /dev/null. You'll either have to use syslog (which is silly, as that's what we'd be debugging), or the samba DEBUG macros, like DEBUG(2, "something broke") and log level = 2 in your smb.conf.

3.4. Try to Break It

Try it and see what goes wrong.

The first thing is that we're logging to a priority that's not going to be sent to the logs: you'll need to change syslog.conf as shown in Figure 8:

Figure 8: Syslog.conf Additions.
...
user.alert              `root, operator'
user.emerg              *
)

# For Samba
user.info		/var/log/syslog

You could also send it to /usr/local/samba/var/lsof.log, at the expense of having to age that log at regular intervals.

3.5. How about Something Harder?

You could try using the samba log functions to write, logs, instead of syslog. This would give you automagic log aging, and remove the dependency on your vendor's syslog.

You would have to work through the sanity check again, as samba logs require checks for failure, lest you try to write to an unopened file descriptor!

A larger project might be implementing a "trashcan": a vfs which moves files to a /trashcan directory instead of deleting them. of course, this is Unix, so you'd only want to do so if the reference count in the inode fell to zero...

A harder case is quotas: on startup, read quotactl, see what the soft and hard limits are, and warn the user when they exceed the soft limit. How to warn them? That's the hard part: perhaps via winpopup.

Or you could write a vfs that calls a specified shell script before any operation: if the smb.conf file says "vfs postopen = foo", call foo after any open,. and fail if rc == ERR. That would require you to know a fire bit about smb.conf parsing, and might be rather slow, but it would be fine for prototyping new VFSs.

Actually the last suggestion's been done already, by Tim.

4. Congratulations

You're now a VFS guru. Next week you can write ext4fs for Linux.

Appendix A: Other People's Work

S.R. Kleiman, Vnodes: An Architecture for Multiple File System Types in Sun UNIX, Summer 1986 USENIX Conference Proceedings, Atlanta, GA, pps 238-247

David S.H. Rosenthal, Evolving the Vnode Interface, Summer 1990 USENIX Conference Proceedings,Anaheim, CA, pps 107-118.

Glenn C. Skinner and Thomas K. Wong, Stacking Vnodes: A Progress Report Summer 1993 USENIX Conference Proceedings,Cincinnati, OH, pps 161-174, http://www.usenix.org/publications/library/proceedings/cinci93/skinner.html

Michael J. Karels, Marshall Kirk McKusick, Toward a Compatible Filesystem Interface, Conference of the European Users' Group, September 1986. http://docs.FreeBSD.org/44doc/papers/fsinterface.html

Richard Gooch, Overview of the Virtual File System , http://www.atnf.csiro.au/~rgooch/linux/vfs.txt

Erez Zadok and Ion Badulescu Computer, A Stackable File System Interface For Linux, http://www.cs.columbia.edu/~ezk/research/linux

David A Rusling, The Linux Kernel Version 0.8-3 , http://www.fokus.gmd.de/linux/LDP/tlk-0.8-3.html/tlk-toc.html

Appendix B : The Complete Do-Nothing VFS

/* 
 * Skeleton VFS module.
 *
 * Copyright (C) Tim Potter, 1999
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *  
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *  
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 * $Id: vfs_tutorial.html,v 1.4 2001/07/23 15:33:19 davecb Exp $
 */

#include "config.h"

#include <stdio.h>
#include <sys/stat.h>
#ifdef HAVE_UTIME_H
#include <utime.h>
#endif
#ifdef HAVE_DIRENT_H
#include <dirent.h>
#endif
#ifdef HAVE_FCNTL_H
#include <fcntl.h>
#endif
#include <errno.h>
#include <string.h>

#include <vfs.h>

/* Function prototypes */

int skel_connect(struct vfs_connection_struct *conn, char *svc, char *user);
void skel_disconnect(void);
SMB_BIG_UINT skel_disk_free(char *path, SMB_BIG_UINT *bsize, 
			    SMB_BIG_UINT *dfree, SMB_BIG_UINT *dsize);

DIR *skel_opendir(char *fname);
struct dirent *skel_readdir(DIR *dirp);
int skel_mkdir(char *path, mode_t mode);
int skel_rmdir(char *path);
int skel_closedir(DIR *dir);

int skel_open(char *fname, int flags, mode_t mode);
int skel_close(int fd);
ssize_t skel_read(int fd, char *data, size_t n);
ssize_t skel_write(int fd, char *data, size_t n);
SMB_OFF_T skel_lseek(int filedes, SMB_OFF_T offset, int whence);
int skel_rename(char *old, char *new);
void skel_sync(int fd);
int skel_stat(char *fname, SMB_STRUCT_STAT *sbuf);
int skel_fstat(int fd, SMB_STRUCT_STAT *sbuf);
int skel_lstat(char *path, SMB_STRUCT_STAT *sbuf);
BOOL skel_lock(int fd, int op, SMB_OFF_T offset, SMB_OFF_T count, int type);
int skel_unlink(char *path);
int skel_chmod(char *path, mode_t mode);
int skel_utime(char *path, struct utimbuf *times);

/* VFS operations structure */

struct vfs_ops skel_ops = {

    /* Disk operations */

    skel_connect,
    skel_disconnect,
    skel_disk_free,

    /* Directory operations */

    skel_opendir,
    skel_readdir,
    skel_mkdir,
    skel_rmdir,
    skel_closedir,

    /* File operations */

    skel_open,
    skel_close,
    skel_read,
    skel_write,
    skel_lseek,
    skel_rename,
    skel_sync,
    skel_stat,
    skel_fstat,
    skel_lstat,
    skel_lock,
    skel_unlink,
    skel_chmod,
    skel_utime
};

/* VFS initialization - return vfs_ops function pointer structure */


extern struct vfs_ops default_vfs_ops; /* For passthrough operation */

struct vfs_ops *vfs_init(struct vfs_ops *former)
{
    
    return(&skel_ops);
}



/* Implementation of VFS functions.  Insert your useful stuff here */


int skel_connect(struct vfs_connection_struct *conn, char *svc, char *user)
{
    return underlying.connect(conn, svc, user);
}

void skel_disconnect(void)
{
    underlying.disconnect();
}

SMB_BIG_UINT skel_disk_free(char *path, SMB_BIG_UINT *bsize, 
			    SMB_BIG_UINT *dfree, SMB_BIG_UINT *dsize)
{
    return underlying.disk_free(path, bsize, dfree, dsize);
}

DIR *skel_opendir(char *fname)
{
    return underlying.opendir(fname);
}

struct dirent *skel_readdir(DIR *dirp)
{
    return underlying.readdir(dirp);
}

int skel_mkdir(char *path, mode_t mode)
{
    return underlying.mkdir(path, mode);
}

int skel_rmdir(char *path)
{
    return underlying.rmdir(path);
}

int skel_closedir(DIR *dir)
{
    return underlying.closedir(dir);
}

int skel_open(char *fname, int flags, mode_t mode)
{
    return underlying.open(fname, flags, mode);
}

int skel_close(int fd)
{
    return underlying.close(fd);
}

ssize_t skel_read(int fd, char *data, size_t n)
{
    return underlying.read(fd, data, n);
}

ssize_t skel_write(int fd, char *data, size_t n)
{
    return underlying.write(fd, data, n);
}

SMB_OFF_T skel_lseek(int filedes, SMB_OFF_T offset, int whence)
{
    return underlying.lseek(filedes, offset, whence);
}

int skel_rename(char *old, char *new)
{
    return underlying.rename(old, new);
}

void skel_sync(int fd)
{
    underlying.sync(fd);
}

int skel_stat(char *fname, SMB_STRUCT_STAT *sbuf)
{
    return underlying.stat(fname, sbuf);
}

int skel_fstat(int fd, SMB_STRUCT_STAT *sbuf)
{
    return underlying.fstat(fd, sbuf);
}

int skel_lstat(char *path, SMB_STRUCT_STAT *sbuf)
{
    return underlying.lstat(path, sbuf);
}

BOOL skel_lock(int fd, int op, SMB_OFF_T offset, SMB_OFF_T count, int type)
{
    return underlying.lock(fd, op, offset, count, type);
}

int skel_unlink(char *path)
{
    return underlying.unlink(path);
}

int skel_chmod(char *path, mode_t mode)
{
    return underlying.chmod(path, mode);
}

int skel_utime(char *path, struct utimbuf *times)
{
    return underlying.utime(path, times);
}

Dave Collier-Brown / [email protected] / [email protected]
Hosted by www.Geocities.ws

1