Introduction and copyright
This is a tutorial on writing FUSE
(Filesystem in UserSpacE for Linux and other systems)
servers speaking the
FUSE protocol and the wire format, as used on
the /dev/fuse device for communication between the Linux 2.6 kernel
(FUSE client) and the userspace filesystem implementation (FUSE server).
This tutorial is useful for writing a FUSE server witout using
libfuse. This tutorial is not a reference: it doesn't contain
everything, but you should be able to figure out the rest yourself.
This document has been written by Péter Szabó on
2009-11-15. It can be used and redistributed under a Creative Commons
Attribution-Share Alike 2.5 Switzerland License.
This tutorial was based on versions 7.5 and 7.8 of the FUSE protocol
(FUSE_KERNEL_VERSION . FUSE_KERNEL_MINOR_VERSION), but it should work
with newer versions in the 7 major series.
This tutorial doesn't give all the details. For that see
the
sample Python source code for a FUSE server.
Further reading
There is no complete and up-to-date reference manual for the FUSE
protocol. The best documents and sources are:
Requirements
- Linux 2.6 (even though FUSE has been ported to many operating systems,
this tutorial focuses on the default Linux implementatation);
- the fuse kernel module 7.5 or later compiled and loaded;
- you being familiar with writing a FUSE server (with libfuse or
one of the Perl, Python, Ruby or other scripting language bindings);
- support for running external programs (like with system(3)) in
the programming language;
- support for creating socketpairs (socketpair(2)) in the
programming language;
- support for receiving filehandles (with recvmsg(2)) in the
programming language (this is tricky, see below).
Overview of the operation of a FUSE server
This overview assumes that the FUSE server is single-threaded.
- Fetch the mount point and the mount options from the command-line.
- Optionally, create the directory $MOUNT_POINT (libfuse
doesn't do this).
- Optionally, do a fusermount -u $MOUNT_POINT to clean up an
existing or stale FUSE filesystem in the mount point directory.
(libfuse doesn't do this).
- Run fusermount(1) to mount a filesystem.
- Receive the /dev/fuse filehandle from fusermount(1).
- Receive, process and rely o the FUSE_INIT message.
- In an infinite loop:
- Receive a message (with a buffer or ≤ 8192 bytes,
recommended 65536 + 100 bytes)
from the FUSE client on the file descriptor.
- If ENODEV is the result, or FUSE_DESTROY is received, break from the
loop.
- Process the message.
- Send the reply to on the file descriptor, except for FUSE_FORGET.
- Clean up so your backend filesystem remains in consistent state.
- Exit from the FUSE server process.
fusermount(1), when called above does the following:
- Uses its setuid bit to run as root.
- Opens the character device /dev/fuse.
- Mounts the filesystem with the mount(2) system call, passing the
file descripto of /dev/fuse to it.
Steps of running fusermount(1) and obtaining the
/dev/fuse file descriptor:
- Create a socketpair (AF_UNIX, SOCK_STREAM) with fd0 and fd1 as
file descriptors.
- Run (with system(3)):
export _FUSE_COMMFD=$FD0; fusermount -o $OPTS $MOUNT_POINT .
Example: export _FUSE_COMMFD=3; fusermount -o ro /tmp/foo .
- Receive the /dev/fuse file descriptor from fd1. This is tricky.
See receive_fd function in
the sample receive_fd.c
for this.
The
sample fuse0.py
contains a Python implementation (using the ctypes or the
dl module to call C code).
- Close fd0 and fd1.
Wire format and communication
Once you have received the /dev/fuse file descriptor, do a
read(dev_fuse_fd, bug, 8192) on it to read the FUSE_INIT message,
and you have to send your reply. After that, you should be reading more
messages, and reply to all of them in sequence (except for FUSE_DESTROY and
FUSE_FORGET messages, which don't require a reply).
All input (FUSE client → server) message types share the same,
fixed-length header format, but the message may contain optional, possible
variable-length parts as well, depending the message type (opcode).
Nevertheless, the whole message must be read in a single read(3), so
you have to preallocate a buffer for that (at least 8192 bytes, may be
larger based on FUSE_INIT negotiation, preallocate 65536 + 100 bytes to
be safe). All integers in messages are
unsigned (except for the negative of errno).
The input message header is:
- uint32 size; size of the message in bytes, including the header;
- uint32 opcode; one of the FUSE_* constants describing the
message type and the interpretation of the rest of the header;
- uint64 unique; unique identifier of the message, must be
repeated in the reply;
- uint64 nodeid; nodeid (describing a file or directory) this
message applies to (can be FUSE_ROOT_ID == 1, or a larger number, what
you have returned in a previous FUSE_LOOKUP repy);
- uint32 uid; the fsuid (user ID) of the process initiating the
operation (use this for access control checks if needed);
- uint32 gid; the fsgid (group ID) of the process initiating the
operation (use this for access control checks if needed);
- uint32 gid; the PID (process ID) of the process initiating the
operation;
- uint32 padding; zeroes to pad up to 64-bits.
The interpretation of the rest of the input message depends on the opcode.
The most common input message types are:
- FUSE_LOOKUP = 1: input is a '\0'-terminated filename
without slashes (relative to nodeid), output is
struct fuse_entry_out;
- FUSE_FORGET = 2: input is a struct fuse_forget_in, there is
no output message;
- FUSE_GETATTR = 3: input is empty, output is
struct fuse_attr_out;
- FUSE_OPEN = 14: input is struct fuse_open_in, output is
struct fuse_open_out;
- FUSE_READ = 15: input is struct fuse_read_in, output is the
byte sequence read;
- FUSE_RELEASE = 18: input is struct fuse_release_in, output is
empty;
- FUSE_INIT = 26: input is struct
fuse_init_in, output is
struct fuse_init_out;
- FUSE_OPENDIR = 27: input is struct fuse_open_in, output is
struct fuse_open_out;
- FUSE_READDIR = 28: input is struct fuse_read_in, output is the
byte sequence read (serialized as FUSE-specific dirents);
- FUSE_RELEASEDIR = 29: input is struct fuse_release_in, output is
empty;
- FUSE_DESTROY = 38: input is empty; there is no output message.
For a read-only filesystem with some files and directories, it is enough to
implement only the opcodes above. See
more
opcodes and their coressponding C structs in the table. The linked
document contains more details about some of the message fields. The
complete up-to-date opcodes and message structs can be found in
fuse_kernel.h.
Each reply output message (FUSE server → client) starts with this
header:
- uint32 size; size of the message in bytes, including the header;
- int32 error; zero for successful completion, a negative errno
value (such as -EIO or -ENOENT) on failure; upon failure, only the reply
header is sent;
- uint64 unique; unique identifier copied from the input message;
Please note that you have to write the whole reply at once (one
write(2) call). Using any kind of buffered IO (such as
stdio.h or C++ streams) can lead to problems, so don't do that.
Feel free to experiment: whatever junk you write as a reply, it won't
make the kernel crash, but you'll get an EINVAL errno for the write(2)
call.
Your FUSE server doesn't have to implement all possible operations
(opcodes). By default, you can just return ENOSYS as errno for any operation
(except for FUSE_INIT, FUSE_DESTROY and FUSE_FORGET) you don't want to
implement.
Common errno values the FUSE server can return:
- ENOSYS: The operation (opcode) is not implemented.
- EIO: Generic I/O error, if other errno values are not appropriate.
- EACCES: Permission denied.
- EPERM: Operation not permitted. Most of the time you need EACCES
instead.
- ENOENT: No such file or directory.
- ENOTDIR: Not a directoy. Return it if a directory operation was
attempted on a nodeid which is not a directory.
The format of struct fuse_init_in used in FUSE_INIT:
- uint32 init_major; the FUSE_KERNEL_VERSION in the kernel; must
be exactly the same your code supports;
- uint32 init_minor; the FUSE_KERNEL_MINOR_VERSION in the kernel;
must be at least what your code supports;
- uint32 init_readahead; ??;
- uint32 init_flags; ??;
The format of struct fuse_init_out reply used in FUSE_INIT:
- uint32 major; the same as FUSE_KERNEL_VERSION in the input;
- uint32 minor; at most FUSE_KERNEL_MINOR_VERSION (init_minor) in
the input, feel free to set it to less if you don't support the newest
version;
- uint32 max_readahead; ?? set it to 65536;
- uint32 flags; ?? set it to 0;
- uint32 unused; set it to 0;
- uint32 max_write; ?? set it to 65536;
You have to implement FUSE_GETATTR to make the user able to do an
ls
-l (or
stat(2)) on the mount point. It will be caled with
nodeid FUSE_ROOT_ID (== 1) for the mount point.
The format of struct fuse_attr_out reply used in FUSE_GETATTR:
- uint64 attr_value; number of seconds the kernel is allowed to
cache the attributes returned, without issuing a FUSE_GETATTR call again;
a zero value is OK; for non-networking filesystems you can set a very high
value, since nobody else would change the attributes anyway;
- uint32 attr_value_ns; number of nanoseconds to add to attr_value;
- uint32 padding; to 64 bits;
- struct fuse_attr attr; node attributes (permissions, owners etc.).
The format of struct fuse_attr reply used in FUSE_GETATTR and
FUSE_LOOKUP:
- uint64 ino; inode number copied to st_ino; can be any
positive integer, the kernel doesn't depend on its uniqueness; it has no
releation to nodeids used in FUSE (except for the name);
- uint64 size; file size in bytes (or 0 for devices); make sure
you set it correctly, because the kernel would truncate rads at this size
even if your FUSE_READ returns more; be aware of the size being cached
(using attr_value);
- uint64 blocks; number of 512-byte blocks occupied on disk; you
can safely set it to zero or any arbitrary value;
- uint64 atime; the last access (read) time, in seconds since the
Unix epoch;
- uint64 mtime; the last content modification (write) time, in seconds
since the Unix epoch;
- uint64 ctime; the last attribute (inode) change time, in
seconds since the Unix epoch;
- uint32 atime_ns; nanoseconds part of atime;
- uint32 mtime_ns; nanoseconds part of mtime;
- uint32 ctime_ns; nanoseconds part of ctime;
- uint32 more; file type and permissions; example file:
S_IFREG | 0644; example directory: S_IFDIR | 0755;
- uint32 nlink; total number of hard links; set it to 1 for both
files and directories by default; for directories, you can speed up some
listing operations (such as find(1)) by setting it to 2 + the
number of subdirectories;
- uint32 uid; user ID of the owner
- uint32 gid; group ID of the owner
- uint32 rdev; device major and minor number for device for
character devices (mode & S_IFCHR) and block devices
(mode & S_IFBLK).
Nodeid and generation number rules
In FUSE_LOOKUP you should return entry_nodeid and generation numbers. If I
undestand correctly, the following rules hold:
- When a (nodeid, name) pair selected which you have never returned
before, you can return any entry nodeid and generation number (except
for those which are in use, see below). These
two numbers uniquely identify the node for the kernel.
- When called again for the same (nodeid, name) pair, you must return
the same entry_nodeid and generation numbers. (So you must remember what
numbers you have returned previously).
- You should count the number of FUSE_LOOKUP requests on the same (nodeid,
name). When you receive a FUSE_FORGET request for the specified entry
nodeid, you must decrement the counter by the nlookups field of the
FUSE_FORGET request. Once the counter is 0, you may safely forget about
the entry nodeid (so it no longer considered to
be in use), and next time you may return the
same or a different nodeid at your choice for the same (nodeid, name) --
but with an increased generation number.
- You must never return the same nodeid with the same generation number
again for a different inode, even after FUSE_FORGET dropped the
reference counter to 0. That is: nodeids that have
been released by the kernel may be recycled with a different
generation number (but not with the same one!).
How to list the entries in a directory
TODO
How to read the contents of a file
TODO
Other TODO
This tutorial is work in progress. In the meantime, please see
the
sample Python source code for a FUSE server.
Open questions
- How does nodeid and generation allocation and deallocation work?
- How to run a multithreaded FUSE server?