TL;DR: Radar built an open source tool for performing side-by-side evaluations of JSON APIs. We hope you find it useful, too.
Radar engineers needed a tool for side-by-side evaluations of changes to the services of our recently released suite of geocoding APIs. We couldn’t find a suitable one, so we wrote our own, very imaginatively called api-diff
. We’re proud to release it on npm and open source it on GitHub today.
A screenshot of api-diff
in console mode, comparing identical API requests sent to two different api servers
Development of api-diff
was prompted by our work on the Pelias geocoder, but it works with any JSON HTTP API. This post is primarily focused on how we use it against the public Pelias API, but the tool itself is generic. It can and has been used against all sorts of JSON APIs inside of Radar.
A geocoder is a piece of software that translates freeform strings to coordinates on the earth with structured metadata associated with that place. For example, "20 Jay St"
becomes "40.7041, -73.9867"
, along with the metadata that it’s an exact street address in Brooklyn, in New York City, in New York State, in the US. Geocoders are challenging pieces of software because in the real world, both addresses and queries are messy.
A sampling of different ways users could search for one address
It’s hard to have an exhaustive set of tests for every way a query might be written. Pelias currently has a great set of regression tests, but we wanted a tool that would let us understand how changes to ranking would affect real-world queries where we didn’t have a verified golden result.
Enter api-diff, an incredibly boringly named suite of tools for evaluating quality or ranking changes between JSON API endpoints. api-diff
can:
All you need to get started is a list of queries, as a CSV or URL paths, and two running copies of a server with some changes between them.
At Radar, our two main use cases for api-diff
are 1) when we are making code changes to the parsing and ranking algorithms in Pelias, or 2) when we are updating the data in the search index. In both cases, we are looking to see what changes occurred in the search results. When making a ranking change, there is usually a tradeoff between improved queries and some unintended losses, so we can use api-diff
to provide a more quantitative answer as to whether or not the improvements outweigh the losses enough to proceed with the change. When updating the data in our index, we are usually expecting a steady increase in quality and coverage, and are mostly on the look out for surprising regressions, often caused by changes in data formats.
We run api-diff
with a command that looks something like this:
api-diff \
# these can be any accessible http server, sometimes we run against our prod index \
--new.host localhost:3100 \
--old.host localhost:4100 \
# csv input, use headings as query parameter keys \
--input_csv ~/RadarCode/geocode-acceptance-tests/input/addresses.csv \
# remap csv column "query" to query param "text" \
--key_map query=text \
# run against /v1/search on our hosts \
--endpoint /v1/search \
# extra options to append to every query \
--extra_params sources=oa,osm \
# ignore all fields named these things in computing our diff \
--ignored_fields bbox geometry attribution timestamp \
# trim our results down to just the first entry under the addresses key` \
--response_filter '$.addresses[0]'\
# output an interactive html diff. other options are text and json \
--output_mode html \
--output_file diffs.html
Which can output a colorized console diff:
Or an interactive HTML diff:
api-diff
can be installed as a command-line tool with npm install -g @radarlabs/api-diff
. If your servers require an API key, you’ll want to create a config file for it.
Our workflow is to use the interactive evaluation capabilities of the HTML output mode of api-diff
. We manually evaluate each change in a browser using keyboard shortcuts. When we’re done, we examine the number of wins and losses to make an informed decision on whether or not to go ahead with this change. Often times, we will notice long before we are done with the evaluation that there are classes of unintended changes that need to be addressed, so we will quit the eval early to go make fixes for that, then run the eval all over again. The HTML output mode will remember quality judgements we’ve made in the past if it sees them again, so the evaluation task is likely to go faster the second time.
We can also save the diff and evaluation output to a JSON file that we can share with our collaborators. For instance, here is the evaluated diff from a change we are working on integrating into Pelias. Load it into https://radarlabs.github.io/api-diff/compare.html and see if you agree with our evaluations.
We’ve had the opportunity to use this tool on pelias-api
, pelias-parser
, and on radar.io’s app server. In every case, it has given us an understanding of the quality changes we made that we couldn’t figure out anywhere else. We hope it’s useful to other people as well.
Have any features you’d like to see? Feel free to open an issue on our GitHub repository.
Interested in building the future of location technology? We’re hiring!
See what Radar’s location and geofencingsolutions can do for your business.