Analysis of assembly code in Linux packages published
Former Debian Project Leader and Linaro developer Steve McIntyre has surveyed a large number of Ubuntu and Fedora packages as part of a detailed study on the use of assembly code in Linux applications. This work was undertaken to identify packages that need porting to the new AArch64 architecture for 64-bit ARM processors. McIntyre generated a list of packages and then checked those that use assembly as part of their code to see what that assembly code was actually used for.
In his report, he splits his findings into three Sergio Leone inspired categories: assembly code used in places and situations where it is reasonable (The Good), assembly code used where it really should not be used at all (The Bad), and use of assembly that is so wrong and bizarre that it amused him (The Ugly / Comical). McIntyre says that while many developers use assembly code in appropriate circumstances, mostly as fallback code when their code is compiled with an obsolete compiler or to improve performance, he has noticed what he calls "cargo-culted assembly code". According to McIntyre, some developers "have clearly seen code used elsewhere and copied it in" regardless of its actual usefulness in the software in question. He says that in many cases like this, the code is also buggy, which makes reworking it even more of a priority.
In general, the study seems to suggest that many developers are moving away from using inline assembly code, which also helps with porting efforts like the one for AArch64. McIntyre also found some bad examples, however. The report's Ugly category mentions ten packages that still include assembly code for DEC VAX systems – code that is clearly superfluous nowadays.
In the packages surveyed, assembly code seems to be mostly used for performance improvements in byte-swapping code and bitwise operations, to identify hardware, for timing-critical operations and floating point control, and in atomic operations for safe use on multi-threaded systems. Often, assembly code is also used in embedded libraries that take large chunks of code from other projects. These libraries include gettext, gnulib, libjpeg, libgc, sqlite and others.
More information on assembly use in the surveyed packages is available from the report in the Linaro wiki. It also includes a list of packages that the Linaro working group, which McIntyre is part of, has identified as needing porting work because of their use of assembly. The report also suggests that as part of the porting work, the general quality of code should be improved where assembly code has been used inappropriately and that this should be done hand-in-hand with the original developers.
(fab)