Kernel Log – Coming in 3.6 (Part 1): Filesystems and storage
by Thorsten Leemhuis
Linux 3.6 introduces quota and backup functions for Btrfs as well as security enhancements for temp directories. New interfaces enable the kernel to be made aware of changes to the sizes of used partitions.
Last Friday, Linus Torvalds released the second pre-release version of Linux 3.6. Torvalds' summer holiday meant that this was released two weeks, rather than the usual one, after the first release candidate. The volume of changes that has since found its way into the main development tree remains at the normal level.
As usual, Torvalds and his fellow developers merged all of the major new features intended for Linux 3.6 at the beginning of its development cycle; it is rare for the kernel developers to add, or revert, any major changes or new features during the stabilisation phase.
The Kernel Log can, therefore, already provide a comprehensive overview of the key changes and most important new features of Linux 3.6, which is expected to arrive in the second half of September. As usual, the Kernel Log overview will be presented in a series of articles that will cover the various kernel areas. The first article below describes the major changes in the kernel's filesystems and storage support; subsequent articles will look at the kernel's graphics drivers, network support, architecture code and other hardware drivers.
Btrfs
The still experimental Btrfs filesystem now supports quotas for subvolumes (separate areas within the filesystem), setting out how much space they are permitted to occupy (1, 2 and others). A further new feature in Btrfs is "send/receive" (1 and others). This enables userspace programs to determine the difference between two snapshots, to save these differences to a file and to restore these backups as required. This is particularly useful for incremental, atomic backups. A more detailed explanation of this function, which is also available in ZFS, can be found in this LWN.net article. The userspace tool for making use of this currently remains confined to the Btrfs filesystem tools development tree. Details about this and other changes relating to Btrfs are discussed by Btrfs maintainer Chris Mason in his main git pull request.
Ext4
According to the commit comments, the ext4 code no longer stores quota information in visible files, but instead stores it in the form of hidden inodes in the metadata. The result is that quota support has been upgraded to a "first class supported feature". A further change to the ext4 code improves performance when overwriting files, as explained by Theodore Ts'o.
Do not follow!
One of the new functions implemented in Linux 3.6 is based on an idea that dates back to 1996 – the kernel can now be configured to not follow hardlinks and softlinks in directories with a set "sticky" bit (such as /tmp/), when those links point somewhere higher up the directory tree. As LWN.net explains in this article, this feature, which can be activated via sysctl, puts a stop to a common trick used by attackers to escalate their privileges by using background services running as root.
Resizing
A new interface allows userspace programs to notify the kernel when the size of a partition they are using changes, allowing the kernel to become aware of changes to the size of mounted or other partitions at runtime and to act accordingly. The program resizepart, which will make use of this new interface, has been included in the recently released second pre-release version of util-linux 2.22-rc2.
Miscellaneous
- Changes to the software RAID code in the MD subsystem should improve the performance of RAID arrays in which one or all of the storage devices are SSDs.
- Device mapper is now able to utilise the RAID 10 functionality provided by the MD subsystem.
- Following several years of development, a large patch collection has now been merged into the memory management and filesystem code in Linux 3.6, with the result that it should now be possible to reliably save swap files to NFS shares (1 and others). This is useful in areas such as thin clients with no local storage.
- The virtio-scsi driver, merged into the kernel in Linux 3.4, now supports hotplugging, allowing disks to be added to, or removed from, virtualised systems at runtime.
- More than one week after the merge window had been closed, Linus Torvalds merged the fabric driver tcm_vhost. The code is classified as staging, but is not living in the staging tree of the kernel. It allows SCSI devices on a host system to be used with minimum overhead by guest systems virtualised using KVM. Nicholas A. Bellinger has published some benchmarking results obtained using the new driver.
- Aacraid, a driver for Adaptec storage adaptors (among others), now supports the "async (performance)" mode offered by series 7 models.
- Restructuring work on VFS (virtual filesystem) and filesystem code based on it has enabled developers to remove kernel daemon pdflush, which previously triggered a write to the superblock every five seconds when changes were made to data stored there. This regular wake call was unhelpful from an energy-saving point of view.
- The XFS status update for July 2012 mentions a number of changes to XFS which have been merged into Linux 3.6, including performance improvements for the inode allocator.