Testing and External Dependencies

Sep 29, 2017 19:55 · 986 words · 5 minutes read docker python testing

The style of TDD (Test Driven Development) advocated for in Growing Object Oriented Software is a practical top down approach. In very rough terms, we write test cases in the following way:

  1. Start with an acceptance test
  2. Make the acceptance test pass
  3. Refactor

The acceptance test allows us to define the behavior of the system from the perspective of the customer/user. After or as we confirm that the functionality works as intended, we can build unit tests to support the classes and methods we introduce during step 2.

Often when writing acceptance tests you run across something that isn’t easy to mock out like a database or queueing system. I call these external dependencies, as opposed to internal dependencies which would be something like a 3rd part library.

If the dependency is internal, you can usually mock it using one of the many mocking frameworks (Python has one built in, unittest.mock).

What are the options for dealing with external dependencies?

Don’t mock

The most naive approach is not to mock the dependency at all but to provide a real instance (such as a real db server) that only gets used in your testing environment. This is disadvantageous for a few reasons:

  1. Somebody has to maintain and patch it
  2. Somebody has to restart it when it goes down
  3. If it’s running on-prem , it’s taking up resources on local hardware, if it’s in the cloud it’s costing money per hour
  4. It might not be accessible from all locations. Can you get to it in your CI environment? Over VPN? Offline?

Not mocking the dependency is arguably the best option because you are using a real instance of the dependency, so whatever features you are testing will always be supported. It’s arguably the worst option in light of the issues above.

Mock the driver

This approach just doesn’t work, but for the sake of argument suppose we have something simple (python and mongodb):

def insert_a_sample_document():
    collection.insert({"test": 1})

We might write a test like:

@patch("module.collection.insert")
def test_insert_will_insert_the_sample_document(insert):
    insert_a_sample_document()
    insert.assert_called_with({"test": 1})

Now what if our function changes slightly

def insert_and_check_a_sample_document():
    collection.insert({"test": 1})
    return collection.find_one({"test": 1}) is not None

Well now we have some problems. Something like this will make the test pass:

@patch("module.collection.insert")
@patch("module.collection.find_one", return_value = {"test": 1})
def test_insert_will_insert_the_sample_document(insert):
    assert insert_a_sample_document()
    insert.assert_called_with({"test": 1})

But now we’ve tightly coupled our test case to the implementation, which reduces the value of the test. Every time I want to insert a different value in the test document, I have to change the test case and the code that tests it.

What we really want is to just say

assert insert_a_sample_document()

Which we can’t do with this method.

Use something like mongomock

Software like mongomock is a good option if you can get it. Libraries like this provide a simulated server that you can write your tests against. This is very close to our first option (actually running a server) but without all the hassle.

The main point against this method is when the abstraction starts to break down. Depending on the complexity of your application, you can easily get into the situation where you want to use some advanced features of your dependency (Mongo’s aggregation framework or bulk writes are a good example) and you find that the mocking library doesn’t support it.

Containers

Running an instance of your dependency inside a container is a great way to solve this issue. We know that we will always be compatible with whatever features our actual dependency has because we aren’t mocking them out, we’re actually using them. We work around the maintenance costs by bringing up the instance on our local infrastructure through containerization, so no servers (virtual or otherwise) have to be used. We can also run the infrastructure anywhere we can run the container daemon.

Using docker for the sake of example, if my application depends on mongodb I can write a Docker file for my app and then write a simple docker-compose.yaml to link my dependency:

version: '3'
services:
    web:
        build .
        ports:
            - "5000:5000"
        link:
            - mongo

    mongo:
        image: "mongo:3.4"

This causes the mongodb container to come up when the web container starts (web depends on mongo). For the acceptance tests, we can then write code like we wanted to before

def test_simple_insert_and_check():
    assert insert_a_sample_document()

    # And even query the database to find out if the
    # document was actually inserted or not

    assert collection.find_one({}) == {"test": 1}

Test cases must be run inside the container for this to work, so we do:

docker-compose run --rm pytest

Watching source files for changes also works with this approach which is a nice bonus:

docker-compose run --rm ptw

Are containers the silver bullet to writing acceptance tests? Nope. There are still dependencies that you can’t run inside of a container. Take for example Spaces from Digital Ocean. There’s no way to run this inside of a container, which means we fall back to either:

  • Finding a mocking library for it
  • Running with some type of test account and separating those tests reliant on it from the rest of our tests to improve run time.

Conclusion

The best option in most cases is to start with a container first. If you can push your dependency into a container, you get all of the benefit of running an actual instance of your dependency with none of the maintenance downsides.

If your dependency happens to be one of the few systems that you can’t containerize then your best bet is still finding a mocking library like moto or mongomock. Containerize where possible, mock where necessary, use a live instance as a last resort.

Side Note: I had to use spaces as an example to prove my point. Originally I was going to use S3 as an example, but you can actually run S3 inside of a container (sort of) with minio. It’s compatible with S3 which means it should work with boto, pretty cool!