There have been several recent, quiet improvements to Lucene that, taken together, have made it surprisingly simple to add geospatial distance faceting to any Lucene search application, for example:
In the past, this has been challenging to implement because it's so dynamic and so costly: the facet counts depend on each user's location, and so cannot be cached and shared across users, and the underlying math for spatial distance is complex.
But several recent Lucene improvements now make this surprisingly simple!
First, Lucene's dynamic range faceting has been generalized to accept any
Second, the Haversine distance function was added to the expressions module. The implementation uses impressively fast approximations to the normally costly trigonometric functions, poached in part from the Java Optimized Development Kit project, without sacrificing too much accuracy. It's unlikely the approximations will ever matter in practice, and there is an open issue to further improve the approximation.
Suddenly, armed with these improvements, if you index latitude and longitude as
First, index your documents with latitude/longitude fields:
At search time, obtain the
Instead of the hardwired latitude/longitude above, you should fill in the user's location.
Using that
Normally you'd use a "real" query instead of the top-level-browse
See the full source code here, from the
When I first tested this example, there was a fun bug, and then later the facet APIs were overhauled, so you'll need to wait for the Lucene 4.7 release, or just use the current the 4.x sources, to get this example working.
While this example is simple, and works correctly, there are some clear performance improvements that are possible, such as using a bounding box as a fast match to avoid computing Haversine for hits that are clearly outside of the range of possible drill-downs (patches welcome!). Even so, this is a nice step forward for Lucene's faceting and it's amazing that geospatial distance faceting with Lucene can be so simple.
Such distance facets, which allow the user to quickly filter their search results to those that are close to their location, has become especially important lately since most searches are now from mobile smartphones.
< 1 km (147)
< 2 km (579)
< 5 km (2775)
In the past, this has been challenging to implement because it's so dynamic and so costly: the facet counts depend on each user's location, and so cannot be cached and shared across users, and the underlying math for spatial distance is complex.
But several recent Lucene improvements now make this surprisingly simple!
First, Lucene's dynamic range faceting has been generalized to accept any
ValueSource
, not just a numeric doc values field from the index. Thanks to the recently added expressions module, this means you can offer dynamic range facets computed from an arbitrary JavaScript expression, since the expression is compiled on-the-fly to a ValueSource
using custom generated Java bytecodes with ASM. Lucene's range faceting is also faster now, using segment trees to quickly assign each value to the matching ranges. Second, the Haversine distance function was added to the expressions module. The implementation uses impressively fast approximations to the normally costly trigonometric functions, poached in part from the Java Optimized Development Kit project, without sacrificing too much accuracy. It's unlikely the approximations will ever matter in practice, and there is an open issue to further improve the approximation.
Suddenly, armed with these improvements, if you index latitude and longitude as
DoubleDocValuesField
s in each document, and you know the user's latitude/longitude location for each request, you can easily compute facet counts and offer drill-downs by any set of chosen distances. First, index your documents with latitude/longitude fields:
1 | Document doc = new Document(); |
ValueSource
by building a dynamic expression that invokes the Haversine function: 1 | private ValueSource getDistanceValueSource() { |
Using that
ValueSource
, compute the dynamic facet counts like this: 1 | FacetsCollector fc = new FacetsCollector(); |
MatchAllDocsQuery
. Finally, once the user picks a distance for drill-down, use the Range.getFilter
method and add that to a DrillDownQuery
using ConstantScoreQuery
: 1 | public TopDocs drillDown(DoubleRange range) throws IOException { |
lucene/demo
module. When I first tested this example, there was a fun bug, and then later the facet APIs were overhauled, so you'll need to wait for the Lucene 4.7 release, or just use the current the 4.x sources, to get this example working.
While this example is simple, and works correctly, there are some clear performance improvements that are possible, such as using a bounding box as a fast match to avoid computing Haversine for hits that are clearly outside of the range of possible drill-downs (patches welcome!). Even so, this is a nice step forward for Lucene's faceting and it's amazing that geospatial distance faceting with Lucene can be so simple.