At Radar, performance is a feature. Our platform processes over 1 billion API calls per day from hundreds of millions of devices worldwide. We provide geolocation infrastructure and solutions, including APIs for:
But as our products and data scale, so do our engineering challenges.
To support this growth, we developed HorizonDB, a geospatial database written in Rust that consolidates multiple location services into a single, highly performant binary. With HorizonDB, we are able to power all of the above use cases with excellent operational footprint:
Before HorizonDB, we split geocoding across Elasticsearch and microservices for forward geocoding, and MongoDB for reverse.
Operating and scaling this stack was costly: Elasticsearch frequently fanned queries to all shards and required service-orchestrated batch updates, while MongoDB lacked true batch ingestion, required overprovisioning, and had no reliable bulk rollback for bad data.
Our goals for this service included:
With these goals in mind, we built HorizonDB using RocksDB, S2, Tantivy, FSTs, LightGBM and FastText.
Data assets are preprocessed using Apache Spark, ingested in Rust and stored as versioned assets in AWS S3.
Rust
‍https://www.rust-lang.org/
A compiled language designed by Mozilla meant for systems programming. There are many aspects the team liked about Rust:
RocksDB
‍https://rocksdb.org/
An in-process Log-Structured Merge (LSM) tree, serves as our primary record store. It's incredibly fast, typically achieving microsecond response times (even with a much larger dataset, faster than other high performance solutions).
Â
S2
‍http://s2geometry.io/
S2 is Google's spatial indexing library that projects a quadtree onto a sphere, turning O(n) point-in-polygon lookups into cachable constant time lookups. While writing HorizonDB we wrote Rust bindings for Google's C++ library that we will open source soon.
FSTs
‍https://github.com/BurntSushi/fst
FSTs are a data structure offering efficient string compression and prefix queries. This blog post by Andrew Gallant describes in great detail how this is achieved. We found 80% of our queries were well-formed and wanted an efficient way to cache these “happy paths”. Using FSTs, we were able to cache millions of these happy-paths on the order of MBs of memory and often returned prefix candidates within single-digit milliseconds.
Tantivy
‍https://github.com/quickwit-oss/tantivy
An in-process inverted index library similar to Lucene.
We made the decision to use an in-process index over an external service such as Elasticsearch or Meilisearch for a few reasons:
FastText
‍https://fasttext.cc/
To improve precision and search quality, we implemented a FastText model trained from a mix of our geocoder corpus and query logs. With FastText, we can semantically represent words in a query in a numeric vector format, suitable for ML applications. FastText is typo-tolerant and handles out-of-vocabulary words with its use of ngrams. “Nearby” vectors represent semantically similar words allowing our ML algorithms to understand semantics of a given word in a search query.
LightGBM
‍https://github.com/microsoft/LightGBM
We have trained multiple LightGBM models to classify query intent and tag parts of our query depending on the intent. This allows us to “structure” our queries, improving search performance and precision. For example, a query deemed as a regional query such as “New York” can skip address search, whereas a query like “841 broadway” allows us to skip searching POIs and regions.
Apache Spark
‍https://spark.apache.org/
With Spark, we are able to process hundreds of millions of data points in less than an hour, with near-linear scalability. We often had to tune or refactor jobs to achieve optimal performance when performing joins or aggregations.
Since our data is written to S3, it becomes trivial to inspect results via Amazon Athena, a hosted deployment of Apache Presto that can read object storage assets using SQL. DuckDB is another lightweight tool that our engineers use to inspect these assets on the fly.
HorizonDB has transformed both the operational and developmental aspects of our geolocation offerings. We've achieved improvements across the board for cost, performance, and scalability:
We are happy with our design decisions with HorizonDB and are prepared for our scale for the foreseeable future. We will touch on how we designed particular features of the system in future blog posts.
Many thanks to our hard-working engineers Bradley Schoeneweis, Jason Liu, Jacky Wang, Binh Robles, Greg Sadetsky, David Gurevich, and Felix Li who made this system a reality.
Radar is more than just an APIÂ layer. Across SDKs, maps, databases, and infrastructure, we're rethinking geolocation from the ground up to offer the fastest, most developer-friendly location stack available.
If this blog post was interesting to you, we're hiring great engineering talent across the board.
Check out our jobs page for more information.
See what Radar’s location and geofencing solutions can do for your business.