Open-sourcing api-diff, a tool for side-by-side evaluations of JSON APIs

TL;DR: Radar built an open source tool for performing side-by-side evaluations of JSON APIs. We hope you find it useful, too.


Radar engineers needed a tool for side-by-side evaluations of changes to the services of our recently released suite of geocoding APIs. We couldn’t find a suitable one, so we wrote our own, very imaginatively called api-diff. We’re proud to release it on npm and open source it on GitHub today.

A screenshot of api-diff in console mode, comparing identical API requests sent to two different api servers


Development of api-diff was prompted by our work on the Pelias geocoder, but it works with any JSON HTTP API. This post is primarily focused on how we use it against the public Pelias API, but the tool itself is generic. It can and has been used against all sorts of JSON APIs inside of Radar.

A geocoder is a piece of software that translates freeform strings to coordinates on the earth with structured metadata associated with that place. For example, "20 Jay St" becomes "40.7041, -73.9867", along with the metadata that it’s an exact street address in Brooklyn, in New York City, in New York State, in the US. Geocoders are challenging pieces of software because in the real world, both addresses and queries are messy.


A sampling of different ways users could search for one address

It’s hard to have an exhaustive set of tests for every way a query might be written. Pelias currently has a great set of regression tests, but we wanted a tool that would let us understand how changes to ranking would affect real-world queries where we didn’t have a verified golden result.

What it is

Enter api-diff, an incredibly boringly named suite of tools for evaluating quality or ranking changes between JSON API endpoints. api-diff can:

  • Compare the responses of a large number of queries between two servers
  • Output diffs to the console, as json, or as an interactive static html page
  • Compare a set of saved query response pairs against responses from a new server
  • Allow a human scorer to quickly score diff’s as positive, neutral, or negative with keyboard shortcuts
  • Generate a baseline set of responses for later comparison

All you need to get started is a list of queries, as a CSV or URL paths, and two running copies of a server with some changes between them.

At Radar, our two main use cases for api-diff are 1) when we are making code changes to the parsing and ranking algorithms in Pelias, or 2) when we are updating the data in the search index. In both cases, we are looking to see what changes occurred in the search results. When making a ranking change, there is usually a tradeoff between improved queries and some unintended losses, so we can use api-diff to provide a more quantitative answer as to whether or not the improvements outweigh the losses enough to proceed with the change. When updating the data in our index, we are usually expecting a steady increase in quality and coverage, and are mostly on the look out for surprising regressions, often caused by changes in data formats.

We run api-diff with a command that looks something like this:

api-diff \
  # these can be any accessible http server, sometimes we run against our prod index \ localhost:3100 \ localhost:4100 \
  # csv input, use headings as query parameter keys \
  --input_csv ~/RadarCode/geocode-acceptance-tests/input/addresses.csv \
  # remap csv column "query" to query param "text" \
  --key_map query=text \
  # run against /v1/search on our hosts \
  --endpoint /v1/search \
  # extra options to append to every query \
  --extra_params sources=oa,osm \
  # ignore all fields named these things in computing our diff \
  --ignored_fields bbox geometry attribution timestamp \
  # trim our results down to just the first entry under the addresses key` \
  --response_filter '$.addresses[0]'\
  # output an interactive html diff. other options are text and json \
  --output_mode html \
  --output_file diffs.html

Which can output a colorized console diff:


Or an interactive HTML diff:


Workflow and usage

api-diff can be installed as a command-line tool with npm install -g @radarlabs/api-diff. If your servers require an API key, you’ll want to create a config file for it.

Our workflow is to use the interactive evaluation capabilities of the HTML output mode of api-diff. We manually evaluate each change in a browser using keyboard shortcuts. When we’re done, we examine the number of wins and losses to make an informed decision on whether or not to go ahead with this change. Often times, we will notice long before we are done with the evaluation that there are classes of unintended changes that need to be addressed, so we will quit the eval early to go make fixes for that, then run the eval all over again. The HTML output mode will remember quality judgements we’ve made in the past if it sees them again, so the evaluation task is likely to go faster the second time.

We can also save the diff and evaluation output to a JSON file that we can share with our collaborators. For instance, here is the evaluated diff from a change we are working on integrating into Pelias. Load it into and see if you agree with our evaluations.

We’ve had the opportunity to use this tool on pelias-api, pelias-parser, and on’s app server. In every case, it has given us an understanding of the quality changes we made that we couldn’t figure out anywhere else. We hope it’s useful to other people as well.

Have any features you’d like to see? Feel free to open an issue on our GitHub repository.

Interested in building the future of location technology? We’re hiring!