[Articles Home]

Writing a Simple File System

Author: Ravi Kiran UVS

1. Objective

We will write a file system with very basic functionality. This is to understand the working of certain kernel code paths of the vfs. Our filesystem has only one file 'hello.txt'.

2. Introduction

We have to register our filesystem with the kernel. During this, we will register certain callbacks which will be called by the kernel later depending on the system call. The read_super callback will be called first i.e., when the user mounts the filesystem. We have to fill the super block with the dentry of the root of our filesystem (we have to create an inode and attach it to the dentry). We have to fill the corresponding operation table fields also. We will fill the operation tables with proper callbacks.

3. Data Structures

These are the data structures used in implementing our file system. This gives a brief introduction about them. To find more information, please refer to other documentation or the code.

a. File System Type (struct file_system_type)
Definition found in include/linux/fs.h
This structure is used to register the filesystem with the kernel. This data structure is used by the kernel at the time of mounting a file system. We have to fill the 'name' field with the name of our file system (example "rkfs") and one more important field is 'read_super'. This is a callback which is expected to fill the super block.

b. Super Block (struct super_block)
Definition found in include/linux/fs.h
This stores the information about the mounted file system. The important fields to be filled are the operation table (s_ops field) and the root dentry (s_root). At the time of mounting a file system, the kernel allocates a new super block object and calls the read_super callback (it identifies the correct file_system_type object based on the file system name) to fill it. So, we have to fill these fields in the read_super callback implementation.

c. Inode (struct inode)
Definition found in include/linux/fs.h
Inode object is the kernel representation of the low level file. We return the dentry of the root of our file system. We have to attach a proper inode also to the dentry.
This structure has two operation tables i_op, i_fop i.e., inode operations and file operations respectively. We will implement one operation in the inode_operations - lookup.
This is called when the kernel is resolving a path. The kernel starts from the top and gets the dentry (also the inode) of a component of the path from its parent. This is achieved by calling inode_operations.lookup on the inode of the parent entry. For example, when the kernel is resolving /a/b, it calls the lookup operation on the inode representing '/' (the dentry already available from the super block). It creates a dentry, sets the name as 'a' and calls lookup on '/' to attach an inode to the dentry. If it is successful i.e., the kernel is able to attach the inode for 'a', it proceeds to lookup 'b' under 'a'.
So, it is important for us to implement the lookup callback.

d. Inode Operations (struct inode_operations)
Definition found in include/linux/fs.h
This is the inode operations table with each field corresponding to a function pointer to handle the task. It has fields like mkdir, lookup etc. We are interested in lookup.

e. DEntry (struct dentry)
Definition found in include/linux/dcache.h
The kernel uses dentries to represent the file system structure. dentries point to inode objects. This has pointers to store the parent-child relationship of the files. Inodes and files do not store any information about the hierarchy.

f. File (struct file)
Definition found in include/linux/fs.h
File object is used to store the process's information about the file. We dont have to fill any fields of the files directly. The kernel takes care of filling the proper fields but we have to implement the file operation callbacks. We register the file operation table when we return the inode object during lookup. The file operations are copied from the i_fop field of the inode object to the file object by the kernel.
We will implement readdir in case of directories (while returning the inode, we have to set the file operation table based on the type of the file) and read/write in the case of regular files. We will have two file operation tables one for directories and the other for regular files.

The relationship between files, dentries and objects is like this:

File
--->
DEntry
--->
Inode

g. File Operations (struct file_operations)
Definition found in include/linux/fs.h
This is the file operations table with each field corresponding to a function pointer to handle the task. It has fields like read, write, readdir, llseek etc.

All these structures have fields used by the kernel in maintaining internal data structures like lists and hash tables etc. So, we cannot use local/global obects. Kernel allocates the object and passes it to our functions so that we can fill the required fields. If we have to allocate the objects, we need to use the corresponding allocator functions.

3. Implementation

   We will register our filesystem with the kernel during the module initialization. After this, the kernel will call us to perform the required tasks. We will be called during
The following table shows the fields we need to fill in the above data structures.

File System Type
Super Block
File Operations
Inode Operations
Inode
DEntry
  • name
  • read_super
  • s_op

  • s_root
  • read
  • write
  • readdir
  • lookup
  • i_ino
  • i_mode
  • i_op
  • i_fop
  • d_inode

3.1 file_system_type
This is the structure registered with the kernel. This can be statically allocated. We need to fill the following fields
     * name
     * read_super

4. Entry points

a. init_module
This is called when the module is loaded. We have to register our file system here. Fill the file_system_type strucure with name and read_super fields and call register_filesystem with the structure. For example,

static struct super_operations rkfs_sops = {
    read_inode: rkfs_s_readinode,
    statfs: rkfs_s_statfs
    //  put_inode: rkfs_s_putinode,
    //  delete_inode:    rkfs_s_deleteinode,
    //  statfs: rkfs_s_statfs
};

...

DECLARE_FSTYPE( rkfs, "rkfs", rkfs_read_super, 0 );

int init_module(void) {
    int err;
    err = register_filesystem( &rkfs );
    return err;
}

b. read_super
This will be called when our file system is mounted. We have to fill the super block object. s_root is the important field to be filled. This is the dentry object of the root most object for this file system. We allocate an inode object using iget. We have to fill the i_ino field with the inode number, i_mode with the mode (in this case, it is a directory and hence S_IFDIR), i_op with the inode operations, i_fop with the file operations for directory. After this, we allocate a dentry with the name '/' using d_alloc_root. This is set to s_root field of the super block. This dentry is contacted to lookup files in this file system.

static struct super_block *rkfs_read_super( struct super_block *sb, void *buf, int size ) {
    sb->s_blocksize = 1024;
    sb->s_blocksize_bits = 10;
    sb->s_magic = RKFS_MAGIC;
    sb->s_op = &rkfs_sops; // super block operations
    sb->s_type = &rkfs; // file_system_type

    rkfs_root_inode = iget( sb, 1 ); // allocate an inode
    rkfs_root_inode->i_op = &rkfs_iops; // set the inode ops
    rkfs_root_inode->i_mode = S_IFDIR|S_IRWXU;
    rkfs_root_inode->i_fop = &rkfs_dir_fops;

    if(!(sb->s_root = d_alloc_root(rkfs_root_inode))) {
        iput(rkfs_root_inode);
        return NULL;
    }

    printk( "rkfs: read_super returning a valid super_block\n" );
    return sb;
}

Let us assume that our file system is mounted under /mnt/rkfs.

c. readdir
This is the file_operations.readdir field. This will be called when the kernel wants the contents of a directory. For example, after mounting our file system, we type "ls /mnt/rkfs". The kernel resolves the name and gets the dentry for the directory object (here it gets the root most inode of our file system which was returned during read_super). It creates the file object with the dentry and for the readdir operation, it calls the readdir field of the file_operation table of that inode. We have registered our file operation table with the inode. So, our readdir callback will be called. In this, we have to fill the contents. We will be given a callback which has to be used to fill the directory contents. This is the filldir callback.

int rkfs_f_readdir( struct file *file, void *dirent, filldir_t filldir ) {
    int err;
    struct dentry *de = file->f_dentry;

    printk( "rkfs: file_operations.readdir called\n" );
    if(file->f_pos > 0 )
        return 1;
    if(filldir(dirent, ".", 1, file->f_pos++, de->d_inode->i_ino, DT_DIR)||
       (filldir(dirent, "..", 2, file->f_pos++, de->d_parent->d_inode->i_ino, DT_DIR)))
        return 0;
    if(filldir(dirent, "hello.txt", 9, file->f_pos++, FILE_INODE_NUMBER, DT_REG ))
        return 0;
    return 1;
}


In our file system, we are supporting only one file i.e., hello.txt. So, the result of ls /mnt/rkfs will be
. .. hello.txt

d. lookup
This is the inode_operations.lookup field. This will be called when the kernel is resolving a path. For example, if we type "ls -l /mnt/rkfs/hello.txt", the kernel has to get the inode for this path. It has the inode of '/mnt/rkfs'. It queries the lookup of the inode operations table of that inode to get the inode for the name hello.txt. It passes a dentry object to it. It is the job of the callback to fill the dentry with a proper inode if a file exists with that name. We check for the name 'hello.txt' and return an inode if the name matches.

struct dentry *rkfs_i_lookup( struct inode *parent_inode, struct dentry *dentry ) {
  struct inode *file_inode;
  if( parent_inode->i_ino != rkfs_root_inode->i_ino || strlen("hello.txt") != dentry->d_name.len || strcmp(dentry->d_name.name, "hello.txt"))
      return ERR_PTR(-ENOENT);
  // allocate an inode object
  if(!(file_inode = iget( parent_inode->i_sb, FILE_INODE_NUMBER )))
      return ERR_PTR(-ENOMEM);
  file_inode->i_size = file_size;
  file_inode->i_mode = S_IFREG|S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH;
  file_inode->i_fop = &rkfs_file_fops;
  //  add the inode to the dentry object
  d_add(dentry, file_inode);
  printk( "rkfs: inode_operations.lookup called with dentry %s. size = %d\n", dentry->d_name.name, file_size );
  return NULL;
}

e. read
This is the file_operations.read field. This will be called when the kernel gets a read request for a file in our file system. For example, we type "cat /mnt/rkfs/hello.txt", the cat program first calls 'open' for the path. Our lookup callback will be called to get the inode for this file. A file object will be constructed with this dentry and the file descriptor (fd) will be returned. cat calls read on the fd. The kernel hands over the request to the read handler on the file_operations table of the file object. We have registered our file_operations table in the inode and hence our handler will be called. We have to fill the content into the user's buffer. The max size of the buffer is also passed. We have to copy the content into this buffer. But remember that this is a user space buffer and we are in the kernel space. So, we will have to use __generic_copy_to_user function to copy the content.

char file_buf[1024] = "Hello World\n";
int file_size = 12;
...
ssize_t rkfs_f_read( struct file *file, char *buf, size_t max, loff_t *offset ) {
    int i;
    int buflen;
    if(*offset > 0)
        return 0;
    printk( "rkfs: file_operations.read called %d %d\n", max, *offset );
    buflen = file_size > max ? max : file_size;
    __generic_copy_to_user(buf, file_buf, buflen);
    //           copy_to_user(buf, file_buf, buflen);
    *offset += buflen; // advance the offset
    return buflen;
}

f. write
This is the file_operations.write field. This is similar to the read handler. This will be called when the user writes something into the file. Here we have to copy the content from user space into the kernel space. So, use __generic_copy_from_user to copy the content.

ssize_t rkfs_f_write (struct file *file, const char *buf, size_t maxlen, loff_t *offset) {
    int count;
    if(*offset > 0) {
        printk("Positive offset %d\n", *offset);
        return 0;
    }
    count = maxlen > sizeof(file_buf) ? sizeof(file_buf) : maxlen;
    __generic_copy_from_user(file_buf, buf, count);
        //    copy_from_user(file_buf, buf, maxlen);
    printk( "file_operations.write called with maxlen=%d, off=%d\n", maxlen, *offset );
    *offset += count;
    if(*offset > file_size)
        file_size = *offset;
    return count;
}


g. cleanup_module
This will be called when the module is removed. We have to unregister our file system at this point. The module count will be incremented and decremented by the file system calls. So the module will not be removed. The kernel takes care of this, so we need not do anything to check if our file system is in use.

5. code

/**
 * Notes:
 * Implementing a small filesystem having one file
 *
 * -> What happens when we mount a file system?
 * -> What do we need to provide to the kernel so that we are mountable?
 * -> What inode, dentry and file operations do we have to support?
 */

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/version.h>
/*
  #if CONFIG_MODVERSIONS==1
  #define MODVERSIONS
  #include <linux/modversions.h>
  #endif
*/
#include <linux/fs.h>
#include <linux/sched.h>

#define RKFS_MAGIC 0xabcd
#define FILE_INODE_NUMBER 2

static struct super_block *rkfs_read_super(struct super_block *, void *, int);
void rkfs_s_readinode( struct inode *inode );
int rkfs_s_statfs( struct super_block *sb, struct statfs *buf );
struct dentry *rkfs_i_lookup( struct inode *parent_inode, struct dentry *dentry );
ssize_t rkfs_f_read( struct file *file, char *buf, size_t max, loff_t *len );
int rkfs_f_readdir( struct file *file, void *dirent, filldir_t filldir );
ssize_t rkfs_f_write (struct file *, const char *, size_t, loff_t *);
int rkfs_f_release (struct inode *, struct file *);

/*
 * Data declarations
 */

static struct super_operations rkfs_sops = {
    read_inode: rkfs_s_readinode,
    statfs: rkfs_s_statfs
    //  put_inode: rkfs_s_putinode,
    //  delete_inode:    rkfs_s_deleteinode,
    //  statfs: rkfs_s_statfs
};

struct inode_operations rkfs_iops = {
    lookup: rkfs_i_lookup
};

struct file_operations rkfs_dir_fops = {
    read   : generic_read_dir,
    readdir: &rkfs_f_readdir
};

struct file_operations rkfs_file_fops = {
    read : &rkfs_f_read,
    write: &rkfs_f_write
    //    release: &rkfs_f_release
};

// use this macro to declare the filesystem structure
DECLARE_FSTYPE( rkfs, "rkfs", rkfs_read_super, 0 );
struct inode *rkfs_root_inode;

char file_buf[1024] = "Hello World\n";
int file_size = 12;

/*
 * File-System Operations
 */
/*
 * This will be called when the kernel is attempting to mount something. It creates a
 * super_block structure and calls our callback/function to fill it.
 */
static struct super_block *rkfs_read_super( struct super_block *sb, void *buf, int size ) {
    sb->s_blocksize = 1024;
    sb->s_blocksize_bits = 10;
    sb->s_magic = RKFS_MAGIC;
    sb->s_op = &rkfs_sops; // super block operations
    sb->s_type = &rkfs; // file_system_type

    rkfs_root_inode = iget( sb, 1 ); // allocate an inode
    rkfs_root_inode->i_op = &rkfs_iops; // set the inode ops
    rkfs_root_inode->i_mode = S_IFDIR|S_IRWXU;
    rkfs_root_inode->i_fop = &rkfs_dir_fops;

    if(!(sb->s_root = d_alloc_root(rkfs_root_inode))) {
        iput(rkfs_root_inode);
        return NULL;
    }

    printk( "rkfs: read_super returning a valid super_block\n" );
    return sb;
}

/*
 * Super-Block Operations
 */

void rkfs_s_readinode( struct inode *inode ) {
    inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
    printk( "rkfs: super_operations.readinode called\n" );
}

/*
 * This will be called to get the filesystem information like size etc.
 */
int rkfs_s_statfs( struct super_block *sb, struct statfs *buf ) {
    buf->f_type = RKFS_MAGIC;
    buf->f_bsize = PAGE_SIZE/sizeof(long);
    buf->f_bfree = 0;
    buf->f_bavail = 0;
    buf->f_ffree = 0;
    buf->f_namelen = NAME_MAX;
    printk( "rkfs: super_operations.statfs called\n" );
    return 0;
}

/*
 * Inode Operations
 */

struct dentry *rkfs_i_lookup( struct inode *parent_inode, struct dentry *dentry ) {
  struct inode *file_inode;
  if( parent_inode->i_ino != rkfs_root_inode->i_ino || strlen("hello.txt") != dentry->d_name.len || strcmp(dentry->d_name.name, "hello.txt"))
      return ERR_PTR(-ENOENT);
  // allocate an inode object
  if(!(file_inode = iget( parent_inode->i_sb, FILE_INODE_NUMBER )))
      return ERR_PTR(-ENOMEM);
  file_inode->i_size = file_size;
  file_inode->i_mode = S_IFREG|S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH;
  file_inode->i_fop = &rkfs_file_fops;
  //  add the inode to the dentry object
  d_add(dentry, file_inode);
  printk( "rkfs: inode_operations.lookup called with dentry %s. size = %d\n", dentry->d_name.name, file_size );
  return NULL;
}

/*
 * File Operations
 */

ssize_t rkfs_f_read( struct file *file, char *buf, size_t max, loff_t *offset ) {
    int i;
    int buflen;
    if(*offset > 0)
        return 0;
    printk( "rkfs: file_operations.read called %d %d\n", max, *offset );
    buflen = file_size > max ? max : file_size;
    __generic_copy_to_user(buf, file_buf, buflen);
    //           copy_to_user(buf, file_buf, buflen);
    *offset += buflen; // advance the offset
    return buflen;
}

ssize_t rkfs_f_write (struct file *file, const char *buf, size_t maxlen, loff_t *offset) {
    int count;
    if(*offset > 0) {
        printk("Positive offset %d\n", *offset);
        return 0;
    }
    count = maxlen > sizeof(file_buf) ? sizeof(file_buf) : maxlen;
    __generic_copy_from_user(file_buf, buf, count);
        //    copy_from_user(file_buf, buf, maxlen);
    printk( "file_operations.write called with maxlen=%d, off=%d\n", maxlen, *offset );
    *offset += count;
    if(*offset > file_size)
        file_size = *offset;
    return count;
}

/*
int rkfs_f_release (struct inode *ino, struct file *file) {
    printk( "rkfs: file_operations.release called\n" );
    return 0;
}
*/

int rkfs_f_readdir( struct file *file, void *dirent, filldir_t filldir ) {
    int err;
    struct dentry *de = file->f_dentry;

    printk( "rkfs: file_operations.readdir called\n" );
    if(file->f_pos > 0 )
        return 1;
    if(filldir(dirent, ".", 1, file->f_pos++, de->d_inode->i_ino, DT_DIR)||
       (filldir(dirent, "..", 2, file->f_pos++, de->d_parent->d_inode->i_ino, DT_DIR)))
        return 0;
    if(filldir(dirent, "hello.txt", 9, file->f_pos++, FILE_INODE_NUMBER, DT_REG ))
        return 0;
    return 1;
}

int init_module(void) {
    int err;
    err = register_filesystem( &rkfs );
    return err;
}

void cleanup_module(void) {
    unregister_filesystem( &rkfs );
}

MODULE_LICENSE("GPL");

6. Instructions to use the code

Compile using
gcc -D__KERNEL__ -DMODULE -I/lib/modules/`uname -r`/build/include -c rkfs.c

This generates a file rkfs.o. Load the module as root using
insmod rkfs.o

Mount the file system using
mount -t rkfs rkfs /mnt/rkfs

Unmount using
umount /mnt/rkfs

Unload the module using
rmmod rkfs


Hosted by www.Geocities.ws

1