Kernel Log - Coming in 3.8 (Part 1)
Filesystems and storage
by Thorsten Leemhuis
Linux now supports F2fs, a filesystem that is specially designed for storage media with flash chips. The developers say that Btrfs is now faster to complete certain tasks and that Ext4 is more efficient when handling small files.
On Friday, Linus Torvalds made available the fourth release candidate of Linux 3.8. Torvalds called on developers to test the RC and was happy to report that development appears to have calmed down. As usual, Torvalds and his fellow developers incorporated all the major new features at the beginning of the Linux 3.8 development cycle. As further major changes are rarely integrated during this current stage of the stabilisation phase, the Kernel Log can already provide a comprehensive overview of the most important new features of the Linux version that is expected to arrive in mid-February.
The overview of 3.8 will be presented in a series of articles that will successively cover the various kernel areas. Part 1, this part, will discuss the most important new features in terms of filesystems, network storage technologies and storage and network hardware drivers; subsequent articles will cover the Linux system's graphics drivers, kernel infrastructure, network support, processor/platform support and other hardware drivers.
F2fs flash filesystem
Linux now supports F2fs (Flash-Friendly File System), a filesystem that was introduced by Samsung developers in October. It is designed for flash storage media that uses a more basic Flash Translation Layer (FTL) than SSDs for desktop PCs and servers – for example USB flash drives, memory cards and the storage media that is used in cameras, tablets and smartphones.
F2fs is a Log-structured File System (LFS) and progressively fills up storage media from the beginning; only once it has reached the end will it return to the beginning and use any areas that may have been deallocated in the meantime. Similar mechanisms are used by the Flash Translation Layer of flash-based storage devices to ensure that flash chips, which only tolerate a limited number of writes, are used in a uniform way; in addition, FTLs make sure that SSDs can be addressed like a hard disk.
Like Btrfs, F2fs uses Copy-on-Write (COW) to sequentially fill storage devices. This provides a certain robustness as the old data will still be available when new data is incomplete as a result of a system crash during the write operation. As a consequence, F2fs doesn't require a journalling feature like Ext4's. Unlike Btrfs and Ext4, Fs2fs does not attempt to prevent data fragmentation; very short access times mean that fragmentation is not an issue with flash storage media.
Its operating principles should allow F2fs to harmonise better with simple FTLs than Ext4 or other LFS filesystems; the F2fs developers have tried to avoid known LFS problems by implementing various clever design ideas. Details of F2fs can be found in the appropriate kernel documentation and in an article on LWN.net that was written by mdadm maintainer Neil Brown. The userspace tools for formatting F2fs drives are available at kernel.org.
Btrfs and Ext4
Btrfs, which continues to be classified as experimental, now includes a "replace" feature that can transfer data from one drive to another faster than before – for example, before replacing a disk (1, 2). Other Btrfs changes include some that are designed to reduce latencies and CPU loads when calling fsync or writing data via O_DIRECT; further patches allow Btrfs to better distribute loads across multiple CPUs, which is said to improve performance (1, 2).
The new Inline Data Support feature allows Ext4 to store files that only consist of a few bytes together with the inode to save storage space and accelerate access (1, 2). Ext4 now also supports the SEEK_DATA and SEEK_HOLE lseek options that were introduced in Linux 3.1 and allow programs such as backup or copy tools to detect, and omit, empty areas in sparse files.
Tmpfs now implements these lseek options as well. The XFS changes include features that detect metadata corruptions caused by write or read errors. Among the NFS modifications are some that allow servers and clients to coordinate cache sizes.
Storage
The RAID6 library that is used with software RAIDs can now use Advanced Vector Extension 2 (AVX2) for certain calculations (1, 2); it is expected that Intel's Haswell processors, due out in a few months, will be able to handle these x86 instructions.
Another kernel addition is the mpt3sas "LSI MPT Fusion SAS 3.0 Device Driver" that supports 12GB SAS chips by LSI; it shares some features with the mpt2sas driver that has been put into maintenance mode. The hptiop driver can now address HighPoint RR4520 and RR4522 controllers.
On MBR-partitioned storage devices, the root partition can now be specified by entering a term such as "root=PARTUUID=0002dd75-01". The kernel will then look for the device that has a 32-bit UUID of 0002:dd75 (often referred to as the "NT disk signature") and attempt to mount its first partition as the root device.
The DRBD kernel code has now reached the level of DRBD 8.4.2; however, the DRBD developers were criticised by the block subsystem maintainer for making such comprehensive changes instead of slowly improving their code over time; the maintainer said that he won't tolerate such an approach in future.