Google Testing Blog: Antoine Picard

Posted by Antoine Picard

We have become test hoarders. Our focus on test-driven development, developer testing and other testing practices has allowed us to accumulate a large collection of tests of various types and sizes. Although this is valiant and beneficial, it is too easy to forget that each test, whether a unit test or a manual test has a cost as well. This cost should be balanced against the benefits of the test when deciding whether a test should be deleted or whether it should be written in the first place.

Let's start with the benefits of a test. It is all too easy to think that the benefits of a test are to increase coverage or to satisfy an artificial policy set by the Test Certified program. Not so. Although it is difficult to measure for an individual test, its benefits are the number of bugs that it kept from reaching production.

There are side benefits to well-tested code as well such as enforcing good design practices such as decomposition, encapsulation, etc. but these are secondary to avoiding bugs.

Short examples of highly-beneficial tests are hard to come by, however counter-examples abound. The following examples have been anonymized but were found at various time in our code tree. Take this test:

def testMyModule(self):
mymodule.main()

Although it probably creates a lot of coverage in mymodule, this test will only fail if main throws an exception. Certainly this is a useful condition to detect but it is wasteful to consume the time of a full run of mymodule without verifying its output. Let's look at another low-value test:

def testFooInitialization(self):
try:
foo = Foo()
self.assertEquals(foo.name, 'foo')
self.assertEquals(foo.bar, 'bak')
except:
pass

This one probably hits the bottom of the value scale for a test. Although it exercises Foo's constructor, catching all exceptions means that the test will never fail. It creates coverage but never catches any bugs.

A fellow bottom-dweller of the value scale is the test that doesn't get fixed. This can be a broken test in a continuous build or a manual test that generates a bug that stays open: either way it's a waste of time. If it's an automated test, it is worth deleting and was probably not worth writing in the first place. If it's a manual test, it's probably a sign that QA is not testing what the PM cares about. Some even apply the broken-window principle to these tests saying that tests that don't get fixed give the impression that testing is not valuable.

Our final specimen is slightly higher value but still not very high. Consider the function Bar:

def Bar():
SlowBarHelper1()
SlowBarHelper2()
SlowBarHelper3()

We could employ stubs or mocks to write a quick unit test of Bar but all we could assert is that the three helpers got called in the right order. Hardly a very insightful test. In non-compiled languages, this kind of test does serve as a substitute syntax-checker but provides little value beyond that.

Let's now turn our attention to the dark side of testing: its cost. The budget of a whole testing team is easy to understand but what about the cost of an individual test?

The first such cost is the one-time cost creating the test. Whether it is the time it takes to write down the steps to reproduce a manual test or the time it takes to code an automated test it is mostly dependent on the testability of the system or the code. Keeping this cost down is an essential part of test-driven development: think about your tests before you start coding.

While the creation of a test has a significant cost, it can be dwarfed by the incremental cost of running it. This is the most common objection to manual testing since the salary of the tester must be paid with every run of the test but it applies to automated tests too. An automated test uses a machine while it's running, that machine and it's maintenance both have a cost. If a test requires specialized hardware to run, those costs go up. Similarly, adding a test that takes 20 minutes to run will consume 20 minutes of the time of each engineer that tries to run it, every time s/he tries to run it! If it's a test that's run before each check-in the cost of that test will go up rapidly. It could be worth the engineering time to reduce its run time to a more reasonable level.

There is one more incremental cost to a test: the cost of its failure. Whenever a test fails, time is spent to diagnose the failure. The reduction of this cost is the reason behind two key principles of good testing:
- don't write flaky tests: flaky tests waste time by making us investigate failures that are not really there
- write self-diagnosing tests: a test should make it clear what went wrong when it fails to allow us to rapidly move towards a fix

The 'economics' of testing can be used to analyze various testing methodologies. For example, true unit tests (small, isolated tests) take one approach to this problem: they minimize the repeated costs (by being cheap to run and easy to diagnose) while incurring a slightly higher creation cost (have to mock/stub, refactor, ...) and slightly lesser benefits (confidence about a small pieces of the system as opposed to the overall system). By contrast, regression tests tend to incur a greater cost (since most regression tests are large tests) but attempt to maximize their benefits by targeting areas of previous failures under the assumption that those are most likely to have bugs in the future.

So think about both the benefits and the costs of each test that you write. Weigh the one-time costs against the repeated costs that you and your team will incur and make sure that you get the benefits that you want at the least possible cost.

Testing Blog

Cost-Benefit Analysis of a Test

Labels

Archive

Feed