Fragmentation
When you ask Linux enthusiasts about a defragmenting tool for Linux, the responses are typically "It's not needed", "Ext3 doesn't fragment" and "Only Windows needs defragmenting". But filesystem fragmentation affects all filesystems in some way.
When looking at fragmentation in Ext3, it is necessary to differentiate between two very different concepts: The internal fragmentation we previously mentioned, and external fragmentation. The latter refers to files whose data blocks are not located consecutively on the disk but scattered across it. This causes more time-consuming head movements than necessary during read and write access. Internal fragmentation "only" takes up disk space, while external fragmentation decreases performance.
Ext3 is quite good at preventing external fragmentation and minimising head movements - for example, it tries to preallocate a sector of eight contiguous blocks for newly created files. The write cache also makes sure that data which applications write in stages is written to disk in one go, and should be in one consecutive area on the disk. Of course, the write cache doesn't help if files grow slowly - for example directories which fill up over time. Fragmentation is also inevitable if files of different sizes keep getting created and deleted, which causes small gaps of free blocks.
The battle against fragmentation also interferes with another optimising strategy: The locality of data and metadata which block groups try to achieve. Since Ext3 tries to store the files within one directory within the same block group, fragmentation can occur even though a lot of contiguous free space may still be available on the disk. The claim that Ext3 only starts to fragment when the file system is filled up to 80 or 90 percent is not always true. Depending on usage patterns, a file system which still has ample space available may still get fragmented.
For example, we found heavily fragmented free areas on an intensively used IMAP server which stores all its emails in individual files - although more than 900 GB of the total disk space of 1.4 TB were still available.
Checking it out
The degree of fragmentation can be tested with the dumpe2fs tool which, if called without options, will return the fragmentation of the free space for every block group. Ideally, there should be one contiguous area of free blocks within every block group; the more small areas of free blocks, the higher the risk that newly created files in this area will be split up into several fragments. In the worst case, individual free blocks will be scattered across the block group. Dumpe2fs can safely be used with mounted file systems, although this may cause inconsistencies - the statistical summaries initially returned for each block group may, for example, display a different number of blocks than the subsequent list of individual blocks.
Since the output of dumpe2fs can become very confusing with hundreds or thousands of block groups we created the eval_dumpe2fs perl script which reads the output of dumpe2fs and returns a statistical summary.
For each block group, the script evaluates how many free blocks are located in areas with at least two blocks ("chunks") and how many are individual blocks, and determines the average size of the free areas ("avg. chunk size"). Critical values are shown in bold. A statistical overview summarises how many block groups are fragmented to what degree.
Dumpe2fs also shows how many inodes remain available in each block group. If a block group contains a lot of small files, its inodes may be used up before all the blocks have been allocated. While this doesn't affect performance initially, Ext3 can no longer guarantee data and metadata locality if too many block groups run out of inodes. And if all the inodes within a file system have been used, no more files can be stored in it regardless of the number of free blocks that may still be available.
Measuring
Of course, we don't stop at knowing the extent to which free areas are fragmented - this only allows us to judge how well Ext3 will be able to avoid fragmenting newly created files. The degree of fragmentation of the existing file system is also interesting information.
After checking the file system, E2fsck returns the percentage of non-contiguous files. For file systems which were unmounted properly, the test needs to be forced with -f
. Even if a file system is quite full, this typically returns a surprisingly low one-digit percentage value. The reason: e2fsck also counts files which don't contain any data blocks (empty files, fast symbolic links, device files). In addition, in typical Linux system many files are smaller than 4 KB and only occupy one block (if the block size is 4 KB); nothing can be fragmented here. Even with maximum fragmentation, the proportion of fragmented files can never be higher than the total proportion of files that occupy more than one block.
Our tool ext2_frag takes a different approach and doesn't count empty and device files. It groups files into different classes according to the number of blocks they occupy. Its statistical overview also returns the proportion of fragmented files among the files which occupy more than one block - this number is usually two to three times higher than the one which relates to all the files returned by e2fsck.
Merely counting the number of fragmented files, however, doesn't say anything about the degree to which a file is fragmented - it only differentiates between fragmented and unfragmented files. Ext2_frag therefore calculates the average number of fragments per file for every size class.
Total fragmentation is represented more accurately if we relate the number of non-contiguous blocks to the number of possible jumps, that being the number of allocated data blocks minus one added up across all files. Ext2_frag calls this the "fragmentation index". This value is 0 when all files are stored in one piece, and 100 for maximum fragmentation.
In addition, the E2fsprogs package also contains the filefrag tool which returns the number of fragments ("extents") a file consists of. The tool also states the minimum number of fragments a file needs to be split up into. Files larger than 128 MB will not fit into one block group and therefore need to be stored in several fragments.
Slightly more convenient than filefrag is the fragments program which searches entire directories and can recurse through the tree using the -r
option. It can return both statistical directory overviews (option -d
) and details about the fragmentation of individual files (option -f
).
While ext2_frag low level accesses the file system and analyses the content of each inode to determine used blocks, filefrag and fragments use a special Ioctl which returns the data blocks occupied by a file. However, this loctl only works with regular files, not with directories. Therefore, the tools may return slightly different results for the same file system. See Measuring fragmentation for more details.
Defragmentation
Fragmentation decreases I/O performance since sequential read and write operations are slowed down unnecessarily by head movements. For many applications, however, this doesn't matter much: In most cases, several applications access the disk simultaneously, resulting in several files being processed at the same time. Linux minimises head movements by rearranging write and read accesses and most files are accessed from cache. Due to file readahead, data has often already been read when it is requested by an application.
However, if I/O load is already high, for example if a large amount of data constantly needs to be read and written, the effects of fragmentation on the system performance can become quite noticeable. We experienced this with our Cyrus IMAP server, which processes a huge amount of email traffic and stores each email in a separate file. Despite ample free disk space, a quarter of all files larger than one block were fragmented - even some of the small files of up to 48 KB which only use direct blocks (see table below). In cases like these it is very likely that defragmentation will improve the situation.
number of blocks | files | fragmented | percent | fragments/file |
1 | 5722303 | 0.00 | 1.00 | |
<= 12 | 3492714 | 761372 | 21.80 | 1.37 |
<= 524 | 513126 | 154964 | 30.20 | 9.30 |
<= 1036 | 26247 | 9233 | 35.18 | 64.78 |
<= 4108 | 21673 | 9670 | 44.62 | 148.23 |
> 4108 | 2462 | 1518 | 61.66 | 380.00 |
all files | 9778525 | 936757 | 9.58 | 2.16 |
files > 1 block | 4056222 | 936757 | 23.09 | 3.80 |
Fragmentation index: 8.80 percent. |
Only a third of the file system is in use, yet Ext3 fragmented.
However, there is no defragmenter for Ext3 - ext2_defrag, which was written many years ago and hasn't been developed for a long time, cannot handle the current versions of Ext2 and Ext3. The only way of defragmenting the system is to copy all the files to a newly created file system or to compress them into a tar archive, ideally located on a different file system, delete the original files and then unpack the archive. This has the pleasant side effect that all the directories will also be recreated and will only contain the files which are actually stored within each directory.
Using the already mentioned fragments tool you can determine whether fragmentation mainly affects selected individual directories (for example those frequently used by an application to create and delete files) and can limit your copying to those directories. In the case of our IMAP server, copying did make quite a difference: I/O peaks in times of high volumes of mail traffic have dropped noticeably. (odi)