Kernel Log: What's new in 2.6.29 - Part 5: Filesystems Btrfs, SquashFS, Ext4 without journaling
If we are to believe the statements made by Linus Torvalds when presenting the seventh release candidate of 2.6.29, and assuming release candidates come weekly, it will be at least another week or two before Linux kernel 2.6.29 becomes available. The Kernel Log will, therefore, continue its report about the new features scheduled for 2.6.29 with what's new in terms of file systems.
Butter file system
As already mentioned during the merge window of 2.6.29, the kernel developers have integrated the experimental Btrfs file system. This doesn't imply that Btrfs is now complete â it rather means that the kernel hackers intend to further develop and mature it within the Linux kernel framework as they did Ext4. Ext4 was integrated into Linux 2.6.19 in autumn 2006 and recently completed its main development phase in Linux 2.6.28.
Btrfs â short for B-tree FS, but generally referred to as butter FS â is a "copy-on-write" file system originally created by kernel developer Chris Mason, who works at Oracle and had previously spent some time handling ReiserFS for Suse. After the first announcement of Btrfs on the Linux Kernel Mailing List (LKML) in July 2007, Mason quickly found support from other developers â the comments to the commit were authored by developers employed, for example, by HP, Intel, Novell/Suse and Red Hat.
As Theodore Ts'o (also called Ted Tso or Tytso) the developer of Ext4 (and Ext2&3) revealed in an email, several months ago at a meeting in autumn 2007, a number of key Linux file system developers already agreed that Btrfs should become the "next generation file system for Linux," . Ts'o says that he and other developers will still continue to develop Ext4, as its tried and tested Ext3 basis and its more advanced development, make it suitable as a bridge, until Btrfs has matured enough to earn the trust of enterprise users.
The Btrfs wiki offers an overview of the most important features of this file system, which was specifically designed for Linux from scratch:
- Extent-based file storage (maximum file size 2^64 bytes)
- Space efficient packing of small files and indexed directories to minimise storage requirements
- Dynamic inode allocation
- Writable snapshots
- Subvolumes
- Checksums on data and metadata
- Compression
- Integrated multiple device support for combining several devices into one volume with several RAID algorithms
- On line file system check and defragmentation
- Very fast off line file system check
- Efficient incremental backup and file system mirroring
- Optional SSD optimisation
The development time line explains a few things that are still on the developers' to-do list, while the changelog offers a good overview of their achievements so far. The file system's structure and operation are explained in another wiki document, and frequently asked questions are answered in the FAQ. While the "on disk" layout of the file system underwent several modifications during the development of Ext4, making it necessary to reformat when switching to a new kernel, the developers of Btrfs plan to spare users this hassle from now on â although further changes to the on disk format of Btrfs can't be ruled out completely, of course.
Btrfs was integrated with its entire development history, which adds up to a total of more than 900 minor and major commits in the Linux source code management system. After its integration in early January this year, the kernel hackers extended the file system to include new features like the support of SELinux. Further changes for 2.6.30 are already being prepared â some of them designed to further improve performance.
While some of the Linux distributions are now considering Ext4 as their standard file system, it will probably be quite some time before the same is true for Btrfs. The Fedora developers, however, have already extended their development branch installer and kernel to include Btrfs support.
Squash file system
While the integration of the experimental Btrfs is initially unlikely to affect the majority of users and distribution developers, the addition of SquashFS should have a more immediate impact. SquashFS is a compressed read-only file system that various Linux distributions already use on their installation and live media (USB, CD or DVD) to minimise storage requirements. For the same reason, SqaushFS is often used in the embedded area as an alternative to Cramfs. The kernel documentation for Squashfs offers detailed explanations of the differences between Cramfs and Squashfs and discusses the new file system's operation.
Several times in the past few years, the developers of SquashFS have tried to get their file system integrated into the Linux kernel, but they didn't manage to comply with the kernel hacker's high quality standards. Although they have worked to improve the criticised code segments in the now integrated version 4.0, the integration of SquashFS still came as quite a surprise â following a long discussion about the pros and cons, as well as several problems in the current code, Linus Torvalds said that it wouldn't make sense not to integrate it if everyone uses it anyway ("if this is really in use by everybody, then not merging it is kind of pointless. "); shortly afterwards, he merged the SquashFS patches into the main development branch.
Even more about file systems
There are also a considerable number of changes to the kernel's long standing file systems. The kernel is now capable of a temporary file system freeze (1, 2, 3, LWN article) â this is, for example, relevant for container virtualisation and for backup solutions. eCryptfs can now encrypt file names (for example 1, 2, 3, 4); the developers incorporated numerous major changes to the Btree algorithm in XFS. They also made major changes to the OCFS2 cluster file system, which now supports ACLs, security attributes, quotas and metadata checksums.
The developers also improved, tidied up and corrected Ext4 in many minor ways â some of the changes have even been integrated into the 2.6.27 and 2.6.28 stable kernel series. In addition, the Ext4 developers adapted the documentation for activating write barriers, which had recently sparked prolonged discussions (see also the related LWN article). Several changes were made to the Fsync algorithm to marginally improve performance.
Thanks to several changes introduced by Google developers, Ext4 file systems can now be run without journaling to further improve their speed â until now, some users still stuck with Ext2 to avoid the journaling overhead. A recent blog entry by Theodore Ts'o contains a few test results detailing the impact journaling has on performance and some thoughts about using Ext file systems on SSDs. Those who are interested in Ext4 can find a lot of additional background information in a recently published article on IBM's developerworks.
The support of online defragmentation ("online defrag") in Ext4 has not yet been integrated in 2.6.29; at the end of January, a new version was released which also incorporates changes to some of the previously criticised code segments (see also the related LWN article).
Many other changes
As well as the changes we've already discussed, 2.6.29 also supports many other important new file system features for the Linux kernel:
Btrfs
- There are too many commits to list, but these Git pull requests offer an overview (1, 2, 3, 4, 5, 6) as well as the Git web interface at kernel.org.
CIFS
Ext[234]
- ext3: Add support for non-native signed/unsigned htree hash algorithms
- ext4: Add markers for better debuggability
- ext4: Add mount option to set kjournald's I/O priority
- ext4: Add sanity checks for the superblock before mounting the filesystem
- ext4: Add support for non-native signed/unsigned htree hash algorithms
- ext4: Remove code to create the journal inode
- ext4: Remove "extents" mount option
- Update Documentation/file systems/ext4.txt
Fuse
- fuse: implement ioctl support
- fuse: implement poll support
- fuse: implement unsolicited notification
- fuse: update interface version
OCFS2
- ocfs2: add mount option and Kconfig option for acl
- ocfs2: add POSIX ACL API
- ocfs2: Add quota calls for allocation and freeing of inodes and space
- ocfs2: add security xattr API
- ocfs2: Add the on-disk structures for metadata checksums.
- ocfs2: Add the underlying blockcheck code.
- ocfs2: Assign feature bits and system inodes to quota feature and quota files
- ocfs2: Enable quota accounting on mount, disable on umount
- ocfs2: Implementation of local and global quota file handling
- ocfs2: Implement quota recovery
- ocfs2: Periodic quota syncing
- ocfs2: Remove JBD compatibility layer
- ocfs2/xattr: Merge xattr set transaction.
SquashFS
- MAINTAINERS: squashfs entry
- Squashfs: block operations
- Squashfs: cache operations
- Squashfs: directory lookup operations
- Squashfs: directory readdir operations
- Squashfs: documentation
- Squashfs: export operations
- Squashfs: fragment block operations
- Squashfs: header files
- Squashfs: initrd support
- Squashfs: inode operations
- Squashfs: Kconfig entry
- Squashfs: Makefiles
- Squashfs: regular file operations
- Squashfs: super block operations
- Squashfs: symlink operations
- Squashfs: uid/gid lookup operations
UBIFS
- UBIFS: add debugfs support
- UBIFS: introduce compression mount options
- UBIFS: remove fast unmounting
- UBIFS: separate debugging fields out
- UBIFS: slight compression optimisation
- UBIFS: use bit-fields to store compression type
XFS
- XFS: add generic btree types
- XFS: add new btree statistics
- XFS: implement generic xfs_btree_decrement
- XFS: implement generic xfs_btree_delete/delrec
- XFS: implement generic xfs_btree_increment
- XFS: implement generic xfs_btree_insert/insrec
- XFS: implement generic xfs_btree_lookup
- XFS: implement generic xfs_btree_rshift
- XFS: implement generic xfs_btree_split
- XFS: make btree root in inode support generic
- XFS: make btree tracing generic
- XFS: refactor xfs_btree_readahead
- XFS: Sync up kernel and user-space headers
- XFS: Update maintainers
VFS, other file systems
- fs: use menuconfig to control the Misc. file systems menu
- filesystem notification: create fs/notify to contain all fs notification
- fix f_count description in Documentation/file systems/files.txt
- introduce new LSM hooks where vfsmount is available.
- kill ->dir_notify()
- nfsd: document new filehandle fsid types
- NFSD: Add documenting comments for nfsctl interface
- GFS2: Support for FIEMAP ioctl
- GFS2: Kill two daemons with one patch
- quota: Allow to separately enable quota accounting and enforcing limits
- poll: allow f_op->poll to sleep
Further background and information about developments in the Linux kernel and its environment can also be found in previous issues of the kernel log at The H Open Source:
- Kernel Log: Morton questions acceptance of Xen Dom0 code; file systems for SSDs
- Kernel Log: Stable series development is speeding up, X Server 1.6 available soon
- Kernel Log: What's new in 2.6.29 - Part 4: ACPI, PCI, PM â notebooks and power saving improvements
- Kernel Log: New stable kernels, AMD 3D documentation and Mesa 7.3 released
- Kernel Log: What's new in 2.6.29 - Part 3: Kernel controlled graphics modes
- Kernel Log: main development phase for 2.6.29 ends, new X.org drivers
- Kernel Log: What's new in 2.6.29 - Part 2: WiMax
- Kernel Log: What's new in 2.6.29 - Part 1: Dodgy Wifi drivers and AP support
Older Kernel logs can be found in the archives or by using the search function at The H Open Source. (thl/c't)
(djwm)