Bug in Linux kernel can damage RAID arrays
Some versions of the Linux kernel contain a bug which, under certain conditions, can destroy some metadata when a system is shutdown, including RAID information such as the level, chunk size and the number of devices in the array. Without this data, it is no longer possible to assemble a software RAID array; this means that payload data, while still present, is not accessible using standard methods.
The bug is only triggered when the RAID array is partially assembled, but not yet started, when the system is shut down. According to a detailed blog posting on the bug by Neil Brown, who maintains the mdadm utility and the software RAID code administered by it, it's pretty unusual for a software RAID array to be in this state when a system is shut down. He adds that it is not, however, impossible, and notes that some users have indeed encountered this problem. Most users reporting the problem are using Ubuntu. Brown says that this may be because it happened to make a release with a vulnerable kernel or that it does something at shutdown which increases the likelihood that the bug will be triggered.
The cause of the bug was merged into the 3.2.0-22.35 kernel used in Ubuntu 12.04; a fix followed in version 3.2.0-24.38, which was released on 21 May. Pre-release versions of Linux kernels 3.4 up to RC5 (released in late April), long term kernels 3.2.14 to 3.2.16, and stable kernels 3.3.1, 3.3.2 and 3.3.3 are also affected. The problem should not be present in any of the 3.2, 3.3 or 3.4 kernel.org kernels released within the last six weeks.
The bug only affects RAID code administered via mdadm. Other kernel RAID implementations, such as the device mapper or btrfs implementations, are not affected. Users running an affected kernel and using a software RAID array are advised to temporarily disable mdadm to prevent the bug from striking on shutting down their kernels. Details on how to do so and more information on the bug can be found in Neil Brown's blog posting.
(crve)