Kernel Log: What's new in 2.6.29 - Part 8: Faster start-up and other behind the scenes changes
Following the eighth pre-release version of 2.6.29, development of the next major kernel revision looks to have hit the home straight. A glance at the changed files and code makes it clear how hard the kernel hackers have been working on 2.6.29, with more new lines of code added over the current development cycle than ever before.
The bulk of the extra code is, as usual, made up of drivers and the sub-systems that surround them. The number of changes which improve behind-the-scenes aspects of the kernel infrastructure is, however, far from small and these are the subject of this Kernel Log, which concludes the "What's coming in 2.6.29" series â indeed it can't be long before Linus Torvalds releases Linux version 2.6.29.
Fastboot
Following on from his "Fastboot" enhancements introduced in 2.6.28, which accelerate the kernel initialisation phase, and by extension, system start-up, Arjen van de Ven has generated further improvements in this area for 2.6.29. These now allow the kernel to initialise some mutually-independent subsystems asynchronously. This allows for the initialisation of one subsystem while code from another subsystem is still, for example, waiting for a response from the hardware it deals with. This kind of wait is especially common when initialising the Libata and SCSI subsystems, which at some points wait for tenths of a second or longer for drives to respond, when querying and setting up media.
This trick allows the boot phase between starting the kernel and transferring to userspace (/sbin/init) to become measurably and tangibly faster â the magnitude of the benefit is, though, strongly system-dependent. Due to some problems, however, parallel initialisation will be largely deactivated in 2.6.29 unless the fastboot code is explicitly activated using the "fastboot" kernel parameter. Van de Ven is planning to take a second shot at the problem in 2.6.30 to allow more users to benefit from these changes. More details on how fastboot works can be found in an article on LWN.net.
Virtualisation and security
KVM developers have once again been busy, introducing around 150Â patches, which include enhancements to NMI, MSI and Kdump support and allow faster access to the MMU. Changes to the IOMMU subsystem improve support for assigning PCI/PCIe devices to guest systems. A number of patches also improve support for container virtualisation. The new Xenfs should improve the interaction between userspace and Xen.
The task credentials patches, which restructure the way the kernel deals with process-specific information (user and group ID, privileges, etc.), have been adopted in this kernel version (LWN.net article, kernel documentation). The LSM (Linux Security Modules) now offer hooks for security frameworks such as AppArmor and Tomoyo, which may use file names to recognise programs â SELinux and Smack, by contrast, rely on extended attributes. From 2.6.29, the kernel's cryptographic code will be able to deal with shash.
Behind the scenes
The Cpumask changes, driven primarily by Rusty Russell, now allow distributors to compile their kernels with support for up to 4096Â CPU cores without incurring major performance penalties on dual or quad core systems. Russell gives a rough overview of the background to the changes and how the code works on his blog.
The Tree RCU patches should allow the kernel to better scale to systems with "a few hundred processors" (e.g. 1). Interested users can find a detailed explanation of how the new code works in an LWN.net article written by a patch author working for IBM.
There are also a whole heap of changes relating to the ftrace tracking infrastructure. The kernel hackers have also included patches to allow the poll() function to sleep and introduced functions which can be used to protect exclusive access to I/O Memory. (LWN.net article). The latter allows drivers to protect I/O memory space used for communicating with hardware from unwanted write access by userspace programs, which can, in the worst case, make the hardware unusable â as occurred with the "e1000e problem", where an internal kernel error during 2.6.27 development disabled some Intel network cards.
The kernel can no longer be correctly compiled using GCC versions 3.0, 3.1, 4.1.0 and 4.1.1.
Yet more changes
In addition to the new features described in this article, the kernel development team has also included a large number of additional patches in 2.6.29 â the kernel hackers have, for example, combined the directories containing the code for supporting SPARC and SPARC64 and have extended the ARM code to add support for the i.MX31. Details of these and many less major, but in no way insignificant, changes can be found in the list below. The individual entries link to the relevant commit in the Linux source code control system, where further information on the change and the relevant patch can be found.
Fastboot
- ahci: add a module parameter to ignore the SSS flags for async scanning
- async: don't do the initcall stuff post boot
- async: make async a command line option for now
- async: make the final inode deletion an asynchronous event
- bootchart: improve output based on Dave Jones' feedback
- bootgraph: make the bootgraph script show async waiting time
- fastboot: Make libata initialization even more async
- fastboot: make scsi probes asynchronous
- fastboot: make the libata port scan asynchronous
- libata: Add a per-host flag to opt-in into parallel port probes
- libata: only ports >= 0 need to synchronize
- partial revert of asynchronous inode delete
Tracing
- blktrace: port to tracepoints
- ftrace: add function tracing to single thread
- ftrace: add quick function trace stop
- tracing: add a tracer to catch execution time of kernel functions
- tracing: add "power-tracer": C/P state tracer to help power optimization
- tracing/function-graph-tracer: enabled by default
- tracing/function-graph-tracer: support for x86-64
- tracing/function-return-tracer: change the name into function-graph-tracer
- tracing/function-return-tracer: set a more human readable output
- tracing: likely/unlikely branch annotation tracer
- tracing: profile likely and unlikely annotations
- tracing, x86: add low level support for ftrace return tracing
Crypto/Security
- CRED: Documentation
- CRED: Inaugurate COW credentials
- CRED: Make execve() take advantage of copy-on-write credentials
- CRED: Separate task security context from task_struct
- CRED: Use RCU to access another task's creds and to release a task's own creds
- crypto: aes - Precompute tables
- introduce new LSM hooks where vfsmount is available.
- libcrc32c: Move implementation to crypto crc32c
- netlabel: Update kernel configuration API
- selinux: Deprecate and schedule the removal of the the compat_net functionality
- smack: Add support for unlabeled network hosts and networks
- user namespaces: document CFS behavior
Virtualisation
- Add DEVPTS_MULTIPLE_INSTANCES config token
- Add domain flag DOMAIN_FLAG_VIRTUAL_MACHINE
- Add domain_flush_cache
- add frontend implementation for the IOMMU API
- Add global iommu list
- Add/remove domain device info for virtual machine domain
- AMD IOMMU: add domain map function for IOMMU API
- AMD IOMMU: add Kconfig entry for statistic collection code
- AMD IOMMU: add protection domain flags
- cgroups: add a per-subsystem hierarchy_mutex
- cgroups: clean up Kconfig
- cgroups: consolidate cgroup documents
- cgroups: documentation updates
- cgroups: make cgroup config a submenu
- Document usage of multiple-instances of devpts
- Enable multiple instances of devpts
- introcude linux/iommu.h for an iommu api
- KVM: change KVM to use IOMMU API
- KVM: Enable MSI for device assignment
- KVM: Enable MTRR for EPT
- KVM: ia64: Re-organize data sturure of guests' data area
- KVM: MSI to INTx translate
- KVM: ppc: Implement in-kernel exit timing statistics
- KVM: support device deassignment
- KVM: use the new intel iommu APIs
- KVM: VMX: Add PAT support for EPT
- KVM: VMX: Provide support for user space injected NMIs
- KVM: x86 emulator: add Src2 decode set
- KVM: x86 emulator: Extend the opcode descriptor
- KVM: x86: Support for user space injected NMIs
- select IOMMU_API when DMAR and/or AMD_IOMMU is selected
- x86: vmware: look for DMI string in the product serial key
Architecture
Generic
- CVE-2009-0029: Convert all system calls to return a long
- CVE-2009-0029: Make sys_pselect7 static
- CVE-2009-0029: Make sys_syslog a conditional system call
- CVE-2009-0029: Move compat system call declarations to compat header file
- CVE-2009-0029: powerpc: Enable syscall wrappers for 64-bit
- CVE-2009-0029: Remove __attribute__((weak)) from sys_pipe/sys_pipe2
- CVE-2009-0029: Rename old_readdir to sys_old_readdir
- CVE-2009-0029: s390: enable system call wrappers
- CVE-2009-0029: s390 specific system call wrappers
- CVE-2009-0029: sparc: Enable syscall wrappers for 64-bit
- CVE-2009-0029: System call wrapper infrastructure
- CVE-2009-0029: System call wrappers part 01
- CVE-2009-0029: System call wrappers part 02
- CVE-2009-0029: System call wrappers part 03
- CVE-2009-0029: System call wrappers part 04
- CVE-2009-0029: System call wrappers part 05
- CVE-2009-0029: System call wrappers part 06
- CVE-2009-0029: System call wrappers part 07
- CVE-2009-0029: System call wrappers part 08
- CVE-2009-0029: System call wrappers part 09
- CVE-2009-0029: System call wrappers part 10
- CVE-2009-0029: System call wrappers part 11
- CVE-2009-0029: System call wrappers part 12
- CVE-2009-0029: System call wrappers part 13
- CVE-2009-0029: System call wrappers part 14
- CVE-2009-0029: System call wrappers part 15
- CVE-2009-0029: System call wrappers part 16
- CVE-2009-0029: System call wrappers part 17
- CVE-2009-0029: System call wrappers part 18
- CVE-2009-0029: System call wrappers part 19
- CVE-2009-0029: System call wrappers part 20
- CVE-2009-0029: System call wrappers part 21
- CVE-2009-0029: System call wrappers part 22
- CVE-2009-0029: System call wrappers part 23
- CVE-2009-0029: System call wrappers part 24
- CVE-2009-0029: System call wrappers part 25
- CVE-2009-0029: System call wrappers part 26
- CVE-2009-0029: System call wrappers part 27
- CVE-2009-0029: System call wrappers part 28
- CVE-2009-0029: System call wrappers part 29
- CVE-2009-0029: System call wrappers part 30
- CVE-2009-0029: System call wrappers part 31
- CVE-2009-0029: System call wrappers part 32
- CVE-2009-0029: System call wrappers part 33
- CVE-2009-0029: System call wrapper special cases
- byteorder: make swab.h include asm/swab.h like a regular header
Generic â Cpumasks
- cpumask: Add alloc_cpumask_var_node()
- cpumask: Add CONFIG_CPUMASK_OFFSTACK
- cpumask: add sysfs displays for configured and disabled cpu maps
- cpumask: centralize cpu_online_map and cpu_possible_map
- cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
- cpumask: convert shared_cpu_map in acpi_processor * structs to cpumask_var_t
- cpumask: documentation for cpumask_var_t
- cpumask: fix CONFIG_NUMA=y sched.c
- cpumask: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask
- cpumask: Introduce topology_core_cpumask()/topology_thread_cpumask(): ia64
- cpumask: Introduce topology_core_cpumask()/topology_thread_cpumask(): powerpc
- cpumask: Introduce topology_core_cpumask()/topology_thread_cpumask(): s390
- cpumask: Introduce topology_core_cpumask()/topology_thread_cpumask(): sparc
- cpumask: make CONFIG_NR_CPUS always valid.
- cpumask: sh: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask
- cpumask: switch over to cpu_online/possible/active/present_mask: core
- cpumask: Use accessors code in core
- sysfs: add documentation to cputopology.txt for system cpumasks
- x86 smp: modify send_IPI_mask interface to accept cpumask_t pointers
- x86: Update io_apic.c to use new cpumask API
ARM
- Add basic support for MX31PDK board.
- Add default configuration for MX31PDK board.
- ARM: 5290/1: [AT91 Add support for the Adeneo NeoCore 926 board]
- ARM: 5319/1: AT91: support AT91CAP9 revC CPUs
- ARM: 5338/1: Add Nuvoton W90P910 Platform support
- ARM: Arrange for platforms to select appropriate CPU support
- ARM: clps7500: remove support
- ARM: CPUFREQ: S3C24XX serial CPU frequency scaling support.
- ARM: DSM320: Add support for the DSM320
- ARM: MX2 pcm038: add 1-wire master support
- ARM: MX31: basic support for mx31moboard platform
- ARM: OMAP3: Add basic support for Pandora handheld console
- ARM: OMAP3: LDP: Add Ethernet device support to make ldp boot succeess
- ARM: pcm037: add 1wire support
- ARM: pcm037: add support for the on-board LAN9217 network controller
- ARM: pxa: add basic support for HP iPAQ h5000
- ARM: pxa/MioA701: add camera support for Mio A701 board.
- ARM: S3C64XX: Basic CPU detection and map initialisation
- i.MX31: framebuffer driver
- i.MX31: Image Processing Unit DMA and IRQ drivers
Blackfin
- Blackfin arch: Add BF537-STAMP platform support for ENC28J60 SPI Ethernet MAC
- Blackfin arch: Add document about bfin-gpio
- Blackfin arch: add support for Blackfin latest processor family BF51x
- Blackfin arch: BF538/9 Linux kernel Support
- Blackfin arch: change HWTRACE Kconfig and set it on default
- Blackfin arch: Cleanup and unify Blackfin IRQ and GPIO IRQ handling
- Blackfin arch: Enable ISP1760 USB Host Driver in platform device initialization code.
- Blackfin arch: Faster C implementation of no-MPU CPLB handler
- Blackfin arch: merge adeos blackfin part to arch/blackfin/
- Blackfin arch: smp patch cleanup from LKML review
- Blackfin arch: SMP supporting patchset: BF561 related code
- Blackfin arch: SMP supporting patchset: Blackfin CPLB related code
- Blackfin arch: SMP supporting patchset: Blackfin header files and machine common code
- Blackfin arch: SMP supporting patchset: Blackfin kernel and memory management code
- Blackfin arch: SMP supporting patchset: some other misc code
MIPS
- MIPS: Add Cavium OCTEON cop2/cvmseg state entries to processor.h.
- MIPS: Add Cavium OCTEON processor constants and CPU probe.
- MIPS: Add Cavium OCTEON processor CSR definitions
- MIPS: Add Cavium OCTEON processor support files to arch/mips/cavium-octeon.
- MIPS: Add Cavium OCTEON processor support files to arch/mips/cavium-octeon/executive and asm/octeon.
- MIPS: Add Cavium OCTEON slot into proper tlb category.
- MIPS: Add Cavium OCTEON specific register definitions to mipsregs.h
- MIPS: Add Cavium OCTEON specific registers to ptrace.h and asm-offsets.c
- MIPS: Add Cavium OCTEON to arch/mips/Kconfig
- MIPS: Add defconfig for Cavium OCTEON.
- MIPS: Add SMP_ICACHE_FLUSH for the Cavium CPU family.
- MIPS: Alchemy: devboards: consolidate files
- MIPS: Alchemy: Move development board code to common subdirectory
- MIPS: Alchemy: new userspace suspend interface for development boards.
- MIPS: Alchemy: RTC counter clocksource / clockevent support.
- MIPS: Alchemy: update core interrupt code.
- MIPS: IP27: Switch from DMA_IP27 to DMA_COHERENT
- MIPS: Use hardware watchpoints on all R1 and R2 CPUs.
Power
- powerpc/85xx: Enable SMP support
- powerpc: Change u64/s64 to a long long integer type
- powerpc/mm: Introduce MMU features
- powerpc/oprofile: IBM CELL: add SPU event profiling support
- powerpc: Rewrite sysfs processor cache info code
- Update powerpc maintainers
S390
- S390: hvc_iucv: Update function documentation
- S390: improve idle cputime accounting
- S390: introduce vdso on s390
- S390: update documentation for hvc_iucv kernel parameter.
SH
- doc: Update sh cpufreq documentation.
- sh: Add platform-specific constants for SH7709
- sh: Add support for SH7201 CPU subtype.
- sh: allow CONFIG_CPU_IDLE
- sh: allow CONFIG_PM
- sh: Generic kgdb stub support.
- sh: sh7760fb: Add support SH7720/SH7721 of Renesas
SPARC
- MAINTAINERS: update sparc maintainer
- sparc64: Use unsigned long long for u64.
- sparc,sparc64: unify Kconfig files
- sparc,sparc64: unify kernel/
- sparc,sparc64: unify Makefile
- sparc: Use 64BIT config entry
x86
- Documentation/x86/boot.txt: payload length was changed to payload_length
- pci: add PCI IDs for devices that need boot irq quirks
- x86, 64-bit: update address space documentation
- x86-64: seccomp: fix 32/64 syscall hole
- x86-64: syscall-audit: fix 32/64 syscall hole
- x86: add cache descriptors for Intel Core i7
- x86: add Dell XPS710 reboot quirk
- x86: add memory hotremove config option
- x86: add X86_FEATURE_HYPERVISOR feature bit
- x86: APIC: enable workaround on AMD Fam10h CPUs
- x86, apm: remove CONFIG_APM_REAL_MODE_POWER_OFF in favor of a kernel parameter
- x86, bts: base in-kernel ds interface on handles
- x86, bts: provide in-kernel branch-trace interface
- x86: change OPTIMIZE_INLINING help to say enabling makes smaller kernels
- x86: cleanup remaining cpumask_t ops in smpboot code
- x86: default to SWIOTLB=y on x86_64
- x86: enable cpus display of kernel_max and offlined cpus
- x86: enable MAXSMP
- x86, mm: enable GBPAGES option by default
- x86: MSI start irq numbering from nr_irqs_gsi
- x86: offer frame pointers in all build modes
- x86: only scan the root bus in early PCI quirks
- x86/oprofile: fix pci_dev use count for AMD northbridge devices
- x86: PAT: update documentation to cover pgprot and remap_pfn related changes - v3
- x86, pci: introduce config option for pci reroute quirks (was: PATCH 0/3 Boot IRQ quirks for Broadcom and AMD/ATI)
- x86, pci: introduce pci=ioapicreroute kernel cmdline option
- x86, pci: introduce pci=noioapicquirk kernel cmdline option
- x86: remove init_mm export as planned for 2.6.26
- x86: Set CONFIG_NR_CPUS even on UP
- x86: some lock annotations for user copy paths
- x86: some lock annotations for user copy paths, v2
- x86: some lock annotations for user copy paths, v3
- x86, sparseirq: clean up Kconfig entry
- x86: support always running TSC on Intel CPUs
- x86: turn CONFIG_SPARSE_IRQ off by default
- x86: update CONFIG_NUMA description
- x86: use NR_IRQS_LEGACY
- x86: use possible_cpus=NUM to extend the possible cpus allowed
Miscellaneous
- avr32: Hammerhead board support
- IA64: enable setting DMAR on by default
- xtensa: Add xt2000 support files.
mm
- Fix page writeback thinko, causing Berkeley DB slowdown
- memcg: explain details and test document
- memcg: fix swap accounting leak
- memcg: handle swap caches
- memcg: introduce charge-commit-cancel style of functions
- memcg: memory cgroup hierarchy documentation
- memcg: mem+swap controller core
- memcg: mem+swap controller Kconfig
- memcg: move all acccounting to parent at rmdir()
- memcg: new force_empty to free pages under group
- memcg: show reclaim stat
- memcg: swappiness
- memcg: synchronized LRU
- mm: add_active_or_unevictable into rmap
- mm: add add_to_swap stub
- mm: add dirty_background_bytes and dirty_bytes sysctls
- mm: add Set,ClearPageSwapCache stubs
- mm: direct IO starvation improvement
- mm: further cleanup page_add_new_anon_rmap
- mm: OOM documentation update
- mm: optimize get_scan_ratio for no swap
- mm: remove gfp_mask from add_to_swap
- mm: remove try_to_munlock from vmscan
- mm: report the MMU pagesize in /proc/pid/smaps
- mm: report the pagesize backing a VMA in /proc/pid/smaps
- mm: show node to memory section relationship with symlinks in sysfs
- mm: vmalloc make lazy unmapping configurable
- NOMMU: Make mmap allocation page trimming behaviour configurable.
- NOMMU: Make VMAs per MM as for MMU-mode linux
- NOMMU: Support XIP on initramfs
- Remove obsolete CONFIG_RESOURCES_64BIT
- shmem: unify regular and tiny shmem
- vmscan: improve reclaim throughput to bail out patch
Miscellaneous
- allow stripping of generated symbols under CONFIG_KALLSYMS_ALL
- bitmap: find_last_bit()
- checkpatch: Add warning for p0-patches
- debug: add notifier chain debugging
- debugobjects: add boot parameter default value
- do_mounts: add device info to mount message
- driver core: add root_device_register()
- file capabilities: add no_file_caps switch (v4)
- fix modules_install via NFS
- kbuild: introduce $(kecho) convenience echo
- kbuild: reintroduce ALLSOURCE_ARCHS support for tags/cscope
- kbuild: strip generated symbols from *.ko
- kbuild: use KECHO convenience echo
- kconfig: add script to manipulate .config files on the command line
- kconfig: improve error messages for bad source statements
- LOCKD: Make lockd_up() and lockd_down() exported GPL-only
- lockstat: contend with points
- module: add MODULE_STATE_LIVE notify
- modules: Use a better scheme for refcounting
- NFS: add "no resvport" mount option
- NFS: "no resvport" mount option changes mountd client too
- oops: increment the oops UUID every time we oops
- PATCH: fast vdso implementation for CLOCK_THREAD_CPUTIME_ID
- PATCH: idle cputime accounting
- PATCH: improve precision of idle time detection.
- PATCH: improve precision of process accounting.
- proc: add /proc/*/stack
- proc: stop using BKL
- regulator: sysfs attribute reduction (v2)
- resource: allow MMIO exclusivity for device drivers
- RTC: Remove the BKL.
- Sanitize gcc version header includes
- scripts: script from kerneloops.org to pretty print oops dumps
- setlocalversion: add git-svn support
- slab: introduce kzfree()
- SLUB: failslab support
- softlockup: increase hung tasks check from 2 minutes to 8 minutes
- sparse irq_desc array: core kernel and x86 changes
- swiotlb: add support for systems with highmem
- sysfs: clarify SYSFS_DEPRECATED help text
- timers: split process wide cpu clocks/timers
- trivial: Update MAINTAINERS entry
- UIO: Pass information about ioports to userspace (V2)
Further background and information about developments in the Linux kernel and its environment can also be found in previous issues of the kernel log at The H Open Source:
- Kernel Log: What's new in 2.6.29 - Part 7: Audio, FireWire, USB, Video and more
- [ticker:uk_122832 Kernel Log: What's new in 2.6.29 - Part 6: Storage, IDE/PATA, SCSI]
- Kernel Log: What's new in 2.6.29 - Part 5: Filesystems Btrfs, SquashFS, Ext4 without journaling
- Kernel Log: Morton questions acceptance of Xen Dom0 code; file systems for SSDs
- Kernel Log: Stable series development is speeding up, X Server 1.6 available soon
- Kernel Log: What's new in 2.6.29 - Part 4: ACPI, PCI, PM â notebooks and power saving improvements
- Kernel Log: New stable kernels, AMD 3D documentation and Mesa 7.3 released
- Kernel Log: What's new in 2.6.29 - Part 3: Kernel controlled graphics modes
- Kernel Log: main development phase for 2.6.29 ends, new X.org drivers
- Kernel Log: What's new in 2.6.29 - Part 2: WiMax
- Kernel Log: What's new in 2.6.29 - Part 1: Dodgy Wifi drivers and AP support
Older Kernel logs can be found in the archives or by using the search function at The H Open Source. (thl/c't)
(djwm)