The Lucene project management committee has announced the availability of Apache Lucene 3.5.0 and Apache Solr 3.5.0. Lucene is a high-performance, full-featured text search library. Solr is a standalone search server that uses Lucene at its core for indexing and search.
The major changes for the Lucene 3.5.0 release are:
- Lower Memory Consumption. There is a substantial reduction (3-5X) of memory needed to hold the terms index. This has been achieved by creating a more memory efficient data structure for holding terms.
- Deep Paging Support. Added IndexSearcher.searchAfter which returns results after a specified ScoreDoc. You can pass the last document on the previous page to the searchAfter method to get to the next page of results.
- SearcherManager. The org.apache.lucene.search.SearcherManager class has been added to simplify the sharing and reopening of IndexSearcher across multiple search threads. Underlying IndexReader instances are safely closed if not referenced anymore, using the IndexReader’s reference counts. The acquire method is used to retrieve an IndexSearcher and the release method is used to close the retrieved IndexSearcher.
- SearcherLifetimeManager. The org.apache.lucene.search.SearcherLifetimeManager class has been added to provide a consistent view of the index across multiple requests. It simplifies the usage of the same IndexSearcher instance between requests, which provides a better user experience when paging or drilling down/up on search results.
- IndexWriter.optimize() Deprecated. IndexWriter.optimize has been deprecated and renamed to forceMerge. This is to discourage the use of this method since it is a very costly operation and only justified if the index is static.
- IndexReader.reopen() Renamed. IndexWriter.reopen has been replaced by openIfChanged. IndexReader.openIfChanged returns null if there are no changes in the index. This method is typically less costly than opening a new IndexReader.
- NGramPhraseQuery. org.apache.lucene.search.NGramPhraseQuery is a PhraseQuery which is optimized for n-gram phrase queries. This can speed up queries 30-50% when n-gram analysis is used.
To see the full list of changes in Lucene 3.5, please visit the Lucene 3.5 Release Notes.
The major changes for the Solr 3.5.0 release are:
- Lucene 3.5.0. Fixes and enhancements from Lucene 3.5.0, most notably the substantial reduction of memory needed for holding the term index.
- Distributed Result Grouping. Support for distributed search result grouping, also called field collapsing. This feature limits the number of documents shown for each “group”, defined as the unique values in a field, and now works with distributed search.
- Language Detection. The new contrib module “langid” adds the ability to detect the language of a document before indexing, so appropriate decisions can be made. It is implemented as an UpdateRequestProcessor using Apache Tika’s LanguageIdentifier or Cybozu’s language-detection library.
- Numeric sortMissingFirst and sortMissingLast Support. Numeric types including Trie field types and dates now support sortMissingFirst and sortMissingLast.
- HunspellStemFilterFactory. Added support for Lucene’s HunspellStemmerFilter which supports stemming for 99 languages. Hunspell is originally an advanced spell checker most famously used in the OpenOffice suite and is used in Solr for stemming.
- hl.q parameter. The optional hl.q parameter has been added, and if specified, overrides the q parameter in the Highlighter (HighlightComponent).
The see the full list of changes in Solr 3.5, please visit the Solr 3.5 Release Notes.