Why Data Scientists Need Unit Tests


As data scientists, we are not known for writing the best code. Data science code often comes from a process of iterative tinkering. And it looks like it. For small problems, drop-in requests, or one-off analysis this is typically fine. For big problems, systems development, or analysis that we'd like to repeat this is a problem.

Why is the way I write a problem?

Code that is written through iterative trial and error development, the style we often adopt in a Jupyter (Python) or gorilla (Clojure) notebook, often has a lot of the problems associated with "Spaghetti Code". Spaghetti Code being a term for code that is dense and incomprehensible. If you can't get a sense for how the code is structured at a glance, then you're probably looking at Spaghetti Code. This is especially true with modern IDEs and code highlighting.

Code highlighting is designed to help you understand what the separate parts of a piece code. If you're looking at a piece of code and the code highlighting does not correspond to some sensible ordering, you're either reaching for a fork (Spaghetti code) or due for a new IDE.

Unit tests to the rescue

A good way to prevent Spaghetti Code and to generally improve program design is to write unit tests. Unit tests are code written before your code that tests aspects of your code. These tests operate as a specification for your code. That is, if the tests fail, then your code is failing. If the tests pass, then your code meets the specification and is "good enough".

Show me an example

Imagine we want to create a image classifier that differentiates between pictures of bunnies, ferrets and hamsters. It's going to ingest an image at a url and return a JSON payload about the classification. At a minimum, we may want to have the following tests:

  1. Test that the API function returns a valid JSON string
  2. Test that the API function returns all the right keys
  3. Test that the API function throws an error if a non-image is provided
  4. Test that the classification function returns a tuple of floats

The first two tests ensure our output is going to be correct, the third ensures that we've included some reasonable safeguards, and the last ensures that our classifier function is working properly. As you can see, these statements outline some design decisions.

  1. We've made the design choice to have two functions: one for the API and one for the classification.
  2. We've made the choice to return JSON.
  3. We've made the choice to represent out classification results as a tuple of likelihoods.

In practice, those unit tests could look something like this:

import unittest,json

class TestAPI(unittest.TestCase):

  x = my_API("path_to_img.png")

  def API_returns_JSON(self):

  def API_returns_correct_keys(self):


  def API_throws_error(self):

class TestClassifier(unittest.TestCase):
    p = my_classify(img)
    def container_is_tuple(self):
    def probs_are_floats(self):

if __name__ == "__main__":

I'm not going to get into the gritty details of implementing unit tests here, but I will explain the code. First, we create two classes: TestAPI and TestClassifier, both derived from the unittest.TestCase. unittest.TestCase is how we organize our unit tests. Typically, we'll have one of these classes for each function we'd like to test. Each TestCase will have several methods, each representing something different we want to testself.

You'll notice here we have a test called API_returns_JSON which, intuitively, tests that our API returns a JSON object. We also have a test called container_is_tuple that tests that the classifier returns its likelihoods inside a tuple.

Inhereiting our classes from TestCase gives us access to several nice methods for unit testing. Here, I show assertEqual and assertTrue. These expect methods expect two parameters and compare that either they are equal in the case of assertEqual, or that they are true in the case of assertTrue.

The last bit of code down at the bottom allows us to run all these tests by calling this script. Assuming we'd saved it somewhere. If we have all our functions defined, these tests will fail.

How does unit testing help?

Unit testing helps by forcing us to think about what our code is going to look like before we write it. It forces us to make decisions about our code, and even use our code, before we actually "code" anything. Often, when we do this, we'll be able to work through mistakes we would have made quickly inexpensively (time wise) and save ourselves time later on.

Additionally, unit tests typically things like data structures and return values. These are the places where the rubber meets the road in out code. A lot of problems can be solved by using the right data structures to store our information and keeping track of which data structures we're using.

For example, np.NaN types are not considered False by if statements in Python, where empty strings, 0 and 0.0 are. If you've got np.NaNs lurking in your data, that could cause trouble. Unit tests will allow you to flag issues like this in advance.

Won't unit testing slow me down?

On the contrary, unit testing will probably make you a more productive programmer, not a less productive one. Programmers who write unit tests spend less time debugging code and less time wondering what part of the program to work on next.

The first, that programmers who write unit tests spend less time debugging code, makes sense, because programmers who are programming against a test have laid out in advance how their code is supposed to work. And when it is working, its working as expected, not as written. (Computers do what we ask, not what we want )

The second, that programmers who write unit tests spend less time wondering what to program next also makes sense. Once you've written unit tests, you have a blueprint for how to write your code: you write your code to pass the tests. Once all the tests pass, you either brush yourself off and call it a day, or you write more tests.

A hidden benefit to unit tests---in my opinion the largest benefit---comes into play when we're refactoring code. Code that has unit tests associated with it will let you know what you've broken when you refactor it. Code without unit tests may look fine, for a while... but then break later on.

What should I be writing unit tests for?

I'd suggest writing unit tests for code in the following situations:

  • Producing a report that the customer wants on a regular basis
  • Scraping data that will be fed into a commercial off-the-shelf platform
  • An API to perform analysis given an input URL
  • Analyzing data for and feeding data to dashboards that someone will look at every day

I'd say it's probably okay to ignore unit tests in these situations:

  • Your customer wants a one-time piece of analysis on unique data
  • You're the only programmer on a demonstration project and all the code is going to be rewritten if the project gets picked up by the customer

As a rule of thumb: if other people are going to depend on it, write tests for it.

Closing comments

Unit testing has not always been a part of my work flow, and for pet projects, often its something I neglect. However, for production quality code, tests are imperative. Tests demonstrate that you've thought through your code; tests allow others to validate that your code is working (i.e., that none of the features that you claim are there are missing); tests given others an opportunity to see your code at work; and tests ensure that your code will complain if it breaks down the road.

Though it is not something that is typically taught to data scientists, mastering unit testing is a skill that can go a long way towards refining the skills of a young data scientist.


Data scientsits who are interested in testing and test-driven data science should become familiar with the two Python testing frameworks: unittest and doctest. Unittest is the classic unit testing framework for writing tests, based on the xUnit stlye.

The Wikipedia page on Unit Testing is informative, as are these books on test-driven development:

I would also strongly reccomend that anyone interested in a test-driven workflow consider the Extreme Programming methedology. Resources for Extreme Programming, or XP, include:

Mastering Large Datasets

My new book, Mastering Large Datasets, is in early release now. Head over to Manning.com and buy a copy today.