Build your own finite state transducer
Have you always wanted your very own Lucene finite state transducer (FST) but you couldn't figure out how to use Lucene's crazy APIs? Then today is your lucky day! I just built a simple web application...
View ArticleScreaming fast Lucene searches using C++ via JNI
At the end of the day, when Lucene executes a query, after the initial setup the true hot-spot is usually rather basic code that decodes sequential blocks of integer docIDs, term frequencies and...
View ArticleA new Lucene suggester based on infix matches
Suggest, sometimes called auto-suggest, type-ahead search or auto-complete, is now an essential search feature ever since Google added it almost 5 years ago. Lucene has a number of implementations; I...
View Article2X faster PhraseQuery with Lucene using C++ via JNI
I recently described the new lucene-c-boost github project, which provides amazing speedups (up to 7.8X faster) for common Lucene query types using specialized C++ implementations via JNI. The code...
View ArticleA new version of the Compact Language Detector
It's been almost two years since I originally factored outthe fast and accurate Compact Language Detector from the Chromium project, and the effort was clearly worthwhile: the project is popular and...
View ArticleSuggestStopFilter carefully removes stop words for suggesters
Lucene now has a nice set of suggesters that use an analyzer to tokenize the suggestions: AnalyzingSuggester, FuzzySuggester and AnalyzingInfixSuggester. Using an analyzer is powerful because it lets...
View ArticleThree exciting Lucene features in one day
Three exciting Lucene features in one day Yesterday was a productive day: suddenly, there are three exciting new features coming to Lucene. Expressions module The first feature, committed yesterday, is...
View ArticleLucene now has an in-memory terms dictionary, thanks to Google Summer of Code
Last year, Han Jiang's Google Summer of Code projectwas a big success: he created a new (now, default) postings format for substantially faster searches, along with smaller indices. This summer, Han...
View ArticlePlaying a sound (AIFF) file from Python using PySDL2
Sometimes you need to play sounds or music (digitized samples) from Python, which really ought to be a simple task. Yet it took me a little while to work out, and the resulting source code is quite...
View ArticlePulling H264 video from an IP camera using Python
IP cameras have come a long ways, and recently I upgraded some old cameras to these new Lorex cameras (model LNB2151/LNB2153) and I'm very impressed. These cameras record 1080p wide-angle video at 30...
View ArticleFast range faceting using segment trees and the Java ASM library
In Lucene's facet module we recently added support for dynamic range faceting, to show how many hits match each of a dynamic set of ranges. For example, the Updated drill-down in the Lucene/Solr issue...
View ArticleGeospatial (distance) faceting using Lucene's dynamic range facets
There have been several recent, quiet improvements to Lucene that, taken together, have made it surprisingly simple to add geospatial distance faceting to any Lucene search application, for example:...
View ArticleFinding long tail suggestions using Lucene's new FreeTextSuggester
Lucene's suggest module offers a number of fun auto-suggest implementations to give a user live search suggestions as they type each character into a search box. For example, WFSTCompletionLookup...
View ArticleUsing Lucene's search server to search Jira issues
You may remember my first blog post describing how the Lucene developers eat our own dog food by using a Lucene search application to find our Jira issues. That application has become a powerful...
View ArticleTesting Lucene's index durability after crash or power loss
One of Lucene's useful transactional features is index durability which ensures that, once you successfully call IndexWriter.commit, even if the OS or JVM crashes or power is lost, or you kill -KILL...
View ArticleChoosing a fast unique identifier (UUID) for Lucene
Most search applications using Apache Lucene assign a unique id, or primary key, to each indexed document. While Lucene itself does not require this (it could care less!), the application usually...
View ArticleA new proximity query for Lucene, using automatons
The simplest Apache Lucene query, TermQuery, matches any document that contains the specified term, regardless of where the term occurs inside each document. Using BooleanQuery you can combine multiple...
View ArticleScoring tennis using finite-state automata
For some reason having to do with the medieval French, the scoring system for tennis is very strange. In actuality, the game is easy to explain: to win, you must score at least 4 points and win by at...
View ArticleApache Lucene™ 5.0.0 is coming!
At long last, after a strong series of 4.x feature releases, most recently 4.10.2, we are finally working towards another major Apache Lucene release! There are no promises for the exact timing (it's...
View ArticleWhere are my new blog posts?
Some of you have noticed that I'm not writing much in this blog lately. But fear not: exciting changes are still happening in Lucene, and I am still writing about them! It's just that most of what I...
View Article