Apache Lucene and Solr 4.0 enter alpha phase
Version 4.0 of the Apache Lucene text search engine and Apache Solr search platform have gone into alpha testing phase and offer a preview of the improvements which will arrive at some point in the future.
The announcement for Solr 4.0 alpha suggests it is the more visibly enhanced of the two projects, especially with the integration of a set of features to make it easier to scale Solr installations named SolrCloud. SolrCloud offers central configuration, automatic failover and durable writes to a cluster of sharded search servers, managed using Apache ZooKeeper. The SolrCloud configurations should offer high availability, automatic replication and queries distributed across the cluster with load balancing.
For users who are using Solr as a primary NoSQL data store, there are also enhancements such as better durability for updates through a transaction log, fast retrieval of the latest version of a document, versioning, optimistic locking and atomic updates of document fields. Other Solr features coming in version 4.0 include pivot faceting, psuedo fields (by aliases or metadata), a spellchecker that works with the main index, psuedo joins and a new web administration interface with support for SolrCloud.
Solr's improvements are, of course, built upon the foundation of the Lucene search technology and the Lucene 4.0 alpha announcement details many of the under-the-hood enhancements to improve performance or flexibility in that foundation. These include pluggable codecs for handling the indexes of terms, postings, stored fields and term vectors, support for per document values such as custom scoring or pre-sorted Sort values, improved indexing performance and a new DirectSpellChecker. Other performance improvements include a "100 to 200 times faster" fuzzy query and more efficient in-memory representation,
Lucene and Solr 4.0 alpha are recommended only for early adopters and the only guarantee the developers give is that the index format is supported for the life of the 5.x series, unless a data corrupting critical bug forces that to change. The Apache 2 licensed alpha releases are available to download (Lucence,Solr) and the Apache developers request feedback in the appropriate discussion forums (Lucene, Solr).
(djwm)