Kernel Log: Coming in 3.10 (Part 2)
File systems and storage
by Thorsten Leemhuis
Bcache has been added to the kernel as a second framework for SSD caching. More compact metadata should speed up btrfs. Checksums help XFS prevent data errors in its file system structures.
Linux 3.10 will include the "block-layer cache" bcache, which can be used to configure one disk as a cache for another disk; a fast SSD, for example, could be used as a cache for a slower hard drive with more capacity. This kind of SSD cache can speed up access to frequently read data and take on write requests until a quieter moment when they can be written to the slower disk.
Bcache is the work of Kent Overstreet of Google, which has been using the tool to improve productivity for some time now; after dm-cache, which was integrated into Linux 3.9 , it is the second cache framework of this kind to be added to the Linux kernel. As device mapper maintainer Alasdair Kergon pointed out at LinuxTag a month ago, the two solutions work in somewhat different ways, which means that one or the other could be the right choice depending on the situation.
Bcache is designed to be better for situations with several small write operations that can then be transferred to the hard drive in a more orderly fashion. A few developers have tried to benchmark the caching solutions recently (1, 2 and others), often including SSD caching software EnhanceIO, which has not yet been integrated into the Linux kernel. The benchmarks, however, did not produce clear results, and there was some criticism of the methodologies. The developers' findings and notes make it clear that each solution works well in some situations and unexpectedly badly in others – clearly, they all could still stand to improve a bit.
File systems
Current state of development
Linus Torvalds released the sixth release candidate for Linux 3.10 on Sunday. He made no mention of when the final version of 3.10 would be out, but he did say he was pleased that he didn't have to curse at subsystem maintainers very much, despite having threatened to do so in the release mail for RC5 after developers had sent him a large number of changes. In an LKML discussion on RC6 about whether cursing is really necessary, Torvalds and other kernel developers defended their strong words, although it's quite possible that some of the statements were made in jest.
The still experimental file system btrfs can store extent metadata in a more compact way and therefore slightly increase its speed. Older kernels, however, don't understand the new file system structures, so they aren't automatic; users have to enable the new storage system with 'btrfstune -x
' (1, 2).
The experimental features in XFS must also be activated in order for the file system to add checksums to a variety of metadata and thereby detect any inconsistencies. More details can be found in the kernel documentation for the new feature.
Ext4 now includes a reserved area that is protected against accidental changes; boot loaders can store code there that other boot loader code tries to load from specific sectors at boot up.
FUSE (Filesystem in Userspace) now supports asynchronous Direct I/O (1, 2) and includes a userspace interface for asynchronous I/O; both updates are particularly interesting for GlusterFS.
Storage
Block and SCSI layers can now use storage hardware's runtime power management features (1, 2).
The RADOS block device (RBD), which is used by the cluster file system Ceph, but can also be used independently, now supports layering which should be particularly useful for quickly cloning images that virtual machines use as disks – this is due to the newly created image being able to build on the previous one, using copy-on-write to transfer the data.
The new fabric module isert can be used to set up an LIO iSCSI target that other computers can communicate with via iSCSI Extensions for RDMA (iSER).
The NVM Express (NMVe) driver can now handle discard and understands SCSI commands, including unmap.