KDE narrowly avoids disaster
KDE contributor Jeff Mitchell has written a blog post on what he says "almost became The Great KDE Disaster Of 2013" – the project narrowly avoiding the loss of all of its Git repositories. The KDE Project hosts 1500 code repositories for a number of open source applications that are affiliated with the desktop environment and all of them were nearly wiped out last week due to a combination of a software fault and a problematic mirroring setup. The KDE developers are now looking into ways of avoiding such a potential disaster in the future.
KDE uses a main Git server at git.kde.org and a number of mirrored servers that provide the Git access for anonymous web users. The developers were treating these servers as backups of the main repository which turned out to be a mistake. When the developers shut down the virtual machines on the main server to perform security updates on the host, something in the process caused filesystem corruption in the VMs that made the Git repositories stored there unusable. Since the secondary servers for the anonymous Git access were configured to sync with the main server and did not check for corruption, they proceeded to synchronise their repositories with the faulty repositories on the main server.
When the KDE maintainers discovered this, it was already too late. The secondary servers had already deleted most of the corrupted repositories as the corruption meant most of them were not listed in the projects file any more; at this point Git assumed they had been deleted legitimately. Mitchell describes this situation as "too perfect a mirror" as the secondary servers had inadvertently duplicated the destroyed data on the main system. At this point, the KDE developers thought they had lost all of their Git repositories.
The developers got lucky however, as a new server they had set up for data centre migration was discovered to have retained a pristine copy of all of the project's repositories. This server did not manage to sync with the corrupt copy of the Git data as its synchronisation window fell into the timespan in which the main server was down for the security update. This lucky coincidence and the completely unconnected data centre migration the server had been prepared for therefore saved all of KDE's Git repositories.
In a follow-up post, Mitchell explains that the developers had other backups in place, specifically tarballs of the source code for all of the hosted projects, but that these backups do not retain the Git metadata and other important information. He also details the changes the project will be making to its backup strategy, which mostly involves including sanity checks in the processes that clone the Git repositories to the mirror servers. By adding those checks, corrupted data will not be duplicated in the future and missing project files will not cause repositories to be dropped automatically when they should not have been. As part of this the new main server will also store the Git information on a ZFS filesystem that can restore previous versions of the data through its internal snapshotting mechanism.
(fab)