I recently described the new lucene-c-boost github project, which provides amazing speedups (up to 7.8X faster) for common Lucene query types using specialized C++ implementations via JNI.
The code works with a stock Lucene 4.3.0 JAR and default codec, and has a trivial API: just call
Now, a quick update: I've optimized
~2X speedup (~90% - ~119%) is nice!
Again, it's great to see a reduced variance on the runtimes since hotspot is mostly not an issue. It's odd that
All changes have been pushed to lucene-c-boost; next I'd like to figure out how to get facets working.
The code works with a stock Lucene 4.3.0 JAR and default codec, and has a trivial API: just call
NativeSearch.search
instead of IndexSearcher.search
. Now, a quick update: I've optimized
PhraseQuery
now as well: Task | QPS base | StdDev base | QPS opt | StdDev opt | % change |
---|---|---|---|---|---|
HighPhrase | 3.5 | (2.7%) | 6.5 | (0.4%) | 1.9 X |
MedPhrase | 27.1 | (1.4%) | 51.9 | (0.3%) | 1.9 X |
LowPhrase | 7.6 | (1.7%) | 16.4 | (0.3%) | 2.2 X |
~2X speedup (~90% - ~119%) is nice!
Again, it's great to see a reduced variance on the runtimes since hotspot is mostly not an issue. It's odd that
LowPhrase
gets slower QPS than MedPhrase
: these queries look mis-labelled (I see the LowPhrase
queries getting more hits than MedPhrase
!). All changes have been pushed to lucene-c-boost; next I'd like to figure out how to get facets working.