Boldly refactoring complex code
At Vinted, new code is deployed to production machines all the time with zero downtime for members. Deployment changesets are small. So we can move fast and fearlessly.
To keep the speed of development from decreasing we have an extensive test suite for easy refactoring.
But in cases where it’s difficult to reproduce a complete dataset from a production environment there are other approaches to testing changes safely.
Here’s what we do:
- we run both old and new code at the same time and discard the result of new code,
- we log old and new code output diff,
- by analyzing the logs we tell if new code performs as expected.
Even if it doesn’t, it won’t disrupt anything in production.
This approach has some caveats. Both code paths must be idempotent. For example, imagine code being tested increments a counter in a database. If we run both old and new code, the counter will be incremented twice. Such cases must be considered in advance.
Query code on the other hand is a good candidate for such testing.
We used to do this by rolling an ad hoc solution for each case. It was cumbersome and had limited logging capabilities. There ought to have been a better way, and there was. :)
Scientist by GitHub provides a simple framework for conducting such testing. It’s a gem for carefully refactoring critical paths.
We use an adapter affectionately called VintedExperiment
.
It logs discrepancies to our centralized logging facility (Graylog) and counts matches and mismatches in statsd.
For data visualization and dashboards we use Grafana with Graphite.
The combination of all of these tools allow us to easily test critical paths.
Let’s say we want to improve a method called find_items
. We extract the old implementation into
old_find_items
temporarily and put the new one into new_find_items
.
And then we set up the test like this:
VintedExperiment
is our wrapper over the Experiment
class of scientist
:
Scientist will always run the code block provided to use
, and it will run the try
block only if the run_if
block returns
true or it is not specified.
If the try
block is run, the results from use
(control) and try
(candidate) will be compared. If a simple
==
comparison is not enough, then the compare
block can be specified.
Results can be inspected in a Grafana dashboard. In the provided example the test was disabled after running it for a while. Multiple lines correspond to countries where our app is deployed.
If there are discrepancies, log messages can be viewed in Graylog. They include details of what exactly did not match.
While recording matches or mismatches as well as the runtime duration the log enables all sorts of live benchmarking possibilities. You can inspect mean, median or any other statistic of runtime durations.
Once the result is satisfying - no discrepancies and increased runtime duration - the test can be removed in favor of the new code.
VintedExperiment
can be extended to support several candidate logging, scientist
supports that, but we haven’t had the use for it.
Vinted is a peer-to-peer marketplace. Searching for items is a core part of our
product. The code inside find_items
is one of the most critical code paths
in our app. Moreover, it’s one of our oldest code. We wrote it nearly 6 years ago.
To make it more complicated, we run many A/B tests inside this code.
All of these factors add up to a lot of complexity in one place.
Scientist
proved to be a great tool for doing a lot of refactoring on this bit of our codebase.
Best part being that no user suffered the consequences of refactoring gone bad.