Testing Software

2013-06-10

you only need to test the bits you want to see work in the end

The Basic Truths of Testing

If your software systems has no features, you will not need to test (beyond testing the overhead that comes with building a system that does nothing but still needs to install cleanly). You may start out with a system with very few features and not require much testing, because it is easy enough to test manually and the developer probably tested the only thing the system does as they wrote it.

As you add features and complexity, the cost of thoroughly testing everything every time the hard way grows and very quickly becomes prohibitively expensive, forcing short cuts and the taking of risks until things blow up in some painfully stressful way.

The only choice here is only when to acknowledge the forces of nature, and to change your approach to one where you can lock in your progress as you go.

Let's get the clichés out of the way:

anything but the most trivial single line piece of source code probably won't work as planned on first try
any tested piece of software being worked on will probably get broken by whoever is working on it
you will not know that software is broken until you use it
you have a choice between your customer finding bugs in your system and your developers finding bugs during testing
it is less expensive to fix or address a problem if you find it sooner

These are probably obvious to you. How about these?

very few developers can write good tests for their own software as there is an inherent conflict of interest between wanting to make their code work and wanting to break their code
existing problems -- known or unknown -- are not only a liability on account of whatever they keep from working properly for however long they remain in the system, they also mask other problems either through direct obstruction or by creating noise
manual testing is a useful tool for finding out how incomplete your automated test suite is: bugs found indicate holes in your test suite
errors in the test code can find some really interesting errors in the code to be tested
there are development tools and there are testing tools; some development environments embrace/accommodate testing better than others
knowing tests will catch your mistakes lowers your stress level and lets you focus on the creative and fun aspects of coding
if you had a really good test suite and an infinite cloud budget, you would not need any developers at all

Testing Methodology

Well intentioned smart people can get rigid and downright crusty about testing. Maybe they are just tired of being told by some hot-shot clever developer that his/her code is flawless. More telling is what is said during discussions on how/what/when to test:

we have always done it that way
this is what everybody else does
this is the default setting of our development/test tools
we have no time to instrument/look/think/do

If you hear these reasons, you might be in trouble, because what is missing here is a deep understanding of what and how your testing is going to give you a better product. With that deep understanding in place, you tend to hear this instead:

this is essential part of the methodology we agreed on at the outset, and the situation has not changed
this will give us the best possible odds for delivering the quality we need

If testing is meant to give you the best chance find to a problem before your customer does, then everything about your testing methodology must flow from that. There are no shortcuts to understanding how and why that works. This is why it is a good idea to take a deep breath, put down your Best Practices, and come to clarity on what exactly you need to do in order to get the necessary quality into this particular piece of software at minimum cost before you get started.

Your overall development methodology and the type of system you are building will determine your choices for testing methodology. Within those constraints, there are many choices to be made in terms of what to test how and in what order. Your testing tools will have some influence as well in that they may make certain things easier than others, but the tools should be chosen consciously to serve their purpose within the test framework, as opposed to letting your testing tools dictate what and how you test.

The Many Flavors of Testing

In order to keep things sane and manageable, software systems get broken down into components. Each component can have smaller components inside it. Each component usually gets connected to some other components with which it exchanges data according to some protocol.

The neat thing about this way of breaking things up is that it also gives us a handle on making even the most complicated systems work: by making it work one component at a time. This is done by exercising different subsets of the system, usually starting with smaller subsets, then moving towards the whole, reason being that there is little hope of having larger pieces cooperate properly if their smaller building blocks are known to be broken. You can also test just about any subset of your system by replacing parts of it with mocks, stubs, or reference implementations, allowing you to develop and test modules in any order you like, or to exercise them sooner, faster, or more thoroughly than would ever be possible with its final intended counterparts.

Unit testing exercises one component at a time, stubbing out as few of the required neighbouring components as necessary.

Integration testing makes sure that components interact properly; this is done by exercising a subset of the component graph, stubbing out as few of the remaining components as necessary.

System testing is essentially a special case of integration testing where all components are being used. System testing can be done for a variety of reasons, in different environments, and with varying expectations of success.

Regression testing tells you quickly wether or not that little innocent change you just made unfixed or broke anything.

Acceptance testing is done to reconcile what you built with what you were supposed to build; it is the only type of system that should really be done by somebody other than the developer due to the obvious conflict of interest.

Sanity testing tentatively makes sure that nothing is going to blow up too spectacularly.

Smoke testing is (usually system) testing under heavy load.

Installation/deployment testing focusses on all those painful issues you will encounter when putting your software onto something other than your own uniquely sterile development system.

If you can open the hood of a component to tweak and examine its innards, then you are white-box testing.

If the hood of the component is welded shut, then you are pretty much restricted to functional testing, because you can only test its function, not its method. This type of testing is also called black-box testing, unless you know something about what is inside, in which case you are grey-box testing.

Where it gets messy is when you look at components which have smaller components hidden inside. In this situation, terms such as component vs system or black vs white become relative to your point of view.

Who Writes Tests and Why?

Developers implementing new features or extending functionality write tests for four primary reasons:

functional tests tell them what to do. If you're doing Test Driven Development, the feature-stories-turned-tests really drive everything
the same tests tell them when they are done and can move on to the next task
white box unit tests help them understand which part of their code is not working yet
integration tests tell them why their functional tests fail even though their unit tests are passing
regression testing tells them whether or not they broke something along the way

Ultimately, they want to have tests because they know -- the good ones anyways -- that they make mistakes (i.e.: there will sometimes are difference between the intended and observed outcomes of their actions), and that it is hard to judge whether or not a change helped or hurt the final product just by looking at it, ESPECIALLY if you are looking at your own work.

Having a tireless, impartial, and dependable entity go over everything you did to make sure your actions match your intentions in every detail frees them from worry and doubt, lets them focus their energy on the creative and fun part of their job, to ignore everything other than the problem they are solving. This, in turn, increases their productivity while lowering their stress levels.

Release Engineers (not every team has them) are responsible for bundling up a certain version of the software and putting it into a state where it can be sent off and installed elsewhere. They focus on the system as a whole:

system testing (including smoke and installation) makes sure that the system works
integration testing tells them which components are not up to spec
regression testing lets them make sure nothing got accidentally unfixed somewhere along the way

The people paying for development want tests for completely different reasons:

the test statistics can be a good indicator how well your development team is doing: will they finish on time and within budget?
acceptance testing tells them wether or not the work is complete and whether or not the system as built fulfills its purpose

Tests can be written and maintained by developers, testers (who are really developers with different skill sets), and sometimes even by product managers. Tests are often considered part of the software, although they are rarely shipped alongside the software.

White box tests are almost always written by the developers working on the tested modules. These tests are often tied to the current inner workings of the modules and will probably be thrown out or redone when those inner workings change.

Functional tests have to be written and maintained according to the changing specifications for a software module. They do not need to be written by the developers who know how the code works. In fact, it is often better to have them created by developers who understand the specifications but have no knowledge of the implementation. This is because developers who know the implementation will carry the same invalid assumptions and blind spots into their test cases.

What to Test When

Testing being in support of development, it is important to think about the order in which tests are written. Some tests may be easy to write, but allow developers to discover the kinds of issues or oversights they need to find early, while other tests may discover annoying corner cases which need to be discovered, but which are not likely to shake the foundations of the system being built. Testing in the right order will keep your developers productively tackling the foundational issues first while your QA people come up with ever more picky corner cases. Testing in the wrong order can idle your development team early on with a false sense of mission accomplished followed by a train wreck late in the schedule.

As for what to test, you should basically test the code you need to have working. What this really means is that you should test everything, but there are some notable exceptions:

code outside your control (although you may still want to know whether or not it works)
errors so "cheap to find later" that it is not worth to test now, especially if now is tight

Other than that, developers should write whatever tests they need in order to make sure their implementation works, and those responsible for acceptance or integration testing need to have automated tests for whatever functional aspects of the specifications can be automatically tested, and a defensible plan for whatever cannot be automated.

Relying on APIs

Your system will likely depend on a large number of services provided by the surrounding operating environment, including third party code (open source or not) and APIs. Every time something in your environment is patched or upgraded, there is a chance that some of your tests will start failing, which could cause you to waste a lot of effort trying to figure what you broke in your code (nothing). Instead of writing a test for every service and piece of software you depend on, you can manage this by running your tests before making changes to the code, and after any kind of auspicious update to the operating environment.

If your code ties into network APIs, then you need to be able to trace, record, compare, and play back the API interactions so you can a) tell that somebody else's network service has started to misbehave and b) you can keep testing with a "working" mock-up. You may even wish to create tests to ensure that the APIs your code needs behave according to your expectations. These tests have an important and independent role outside the development process because the behaviour of somebody else's network API can and will change completely outside of your control and without a warning, meaning that even long after your development cycle is complete, somebody can break your code for you. These tests, then, should probably be run not only during development, but regularly for as long as your code is in production.

The Cost of Testing

Nobody wants to pay for testing. They will want to pay for quality. So, the question then is how to get the most (and sufficient) quality per cost. Also, cost comes in form of time or money or features.

If your development is test driven, then large parts of the functional tests suites are hopefully just user stories captured in a machine readable format, meaning that they come almost for free, will be in sync with the specifications, and are needed for acceptance testing anyways. They can also be used for system testing, and you could even use them to drive some top down integration testing, as well as unit testing for your top level code.

This is a good example of how the same work product, the specifications, can be reused in many places. Some amount of test-only infrastructure is necessary to manifest those tests based on the specifications, and it is worth the time to structure the supporting test code so that it can be used to also create test scenarios for additional tests without too much additional work or without unnecessary duplication (i.e.: DRY).

How do you know when you have a sufficiently large test suite? Here are some indications:

your test coverage is 100% (Note: complete test coverage and all tests passing does NOT mean your code is bug free)
your test scenarios represent all conceivable configuration states and classes of input
adding new tests is not discovering any new bugs

Building test scenarios efficiently is an art form and can be a lot of fun. It requires a sense of humour as well as analytical skills relevant to what is being tested. If, for example, the code being tested is a network service for maintaining a graph such as a social network or simulated digital circuit, then knowledge of graph theory as well as number theory can be immensely helpful in constructing an exhaustive list of representative test scenarios with little effort. In practice, however, the most junior or expendable development team members are often relegated to the test team while the cool kids get to write production code, which can produce nifty but somewhat under-tested code.

Limited reference implementations for all or parts of the code can also be a good way to create test scenarios because they take care of coming up with the expected behaviour. They can also be used during unit and integration testing at various levels. If you want a new test case, just add random input and the reference implementation will tell you what the output should be. This can be immensely helpful for developers because it gives them an oracle that answers any what-if questions they could think of.

Manual Testing

When I say "manual testing", I am talking about ...

somebody interacting with some sort of computer human interface (screen, art installation, rocket launcher) and...
raising a fuss somehow when...
things do not work as they should.

While this is better than nothing, it's really not much better than that, here are some reasons why:

humans are inconsistent. Human attention varies from person to person and over time, meaning that that no two people are going to find the same problems, or the same person may find different problems at different times based on the time of day or what they had for dinner the night before
humans have state: they learn, and they cannot help it. Their interactions with the software being tested interfere with future interactions. In other words, their recall of previous working or broken features will hamper their required ability to be impartial and thorough.
humans are lazy. In their effort to be efficient, they will make invalid or hopeful assumptions about the state of the system under test. These patterns of assumptions (blindspots) are highly idiosyncratic an complex.
humans are well intentioned (the ones you would want working on your projects, anyways). Developers, especially, are often darmically handicapped by their innate (and usually highly valued) 'make it work' mindset.

I just gave you two lists. Each of the steps of the first list interacts with problems from the second. I invite you to imagine all the different ways in which that can go wrong on any given day. As you go through the matrix of possible interactions, you will quickly understand why manual software testing does not provide much of a guarantee in terms of quality.

In addition, manual testing is slow, expensive and does not scale. It often takes lead time and it may involve convincing people into doing things they would otherwise prefer not to do, or which they consider below their pay rate. For most creative, successful, and busy individuals, it's right up there with doing dishes, sweeping floors, or checking to the dog for lice. Thus, in practice, it is avoided by those who are tasked to do it, even if it is the only type of testing being done.

That said, manual testing is not a bad thing. Every craftsperson who takes pride in their work will take one last personal look at their creation before it goes out the door, to the showroom floor, or the dinner table. Experience will teach you (if it has not already) that if you rely on your automation 100%, you will wind up in some horribly embarrassing fully automated high volume just-in-time disaster. So, if you have a sense of ownership and pride in what you and your team are building, or you are curious what kind of experience your customers are going to have as they interact with the systems you are building (when you are not around to look over their shoulder), or you have some sort of secondary responsibility for the outcome (because you are higher up the management chain) or you have a vested (think shares, dividends, or pay check) material interest in the value of the system you are building, then it probably would be a good idea to get your hands dirty and play with the system, understand it's shortcomings, and see if you can help making it better.

So regardless of whether or not manual testing is your only testing, your sanity testing, or the QA to your QA, there are some simple rules you can follow to make it more productive:

make a test plan on the outset that covers what minimum set of features you want to see working, then expand from there This is much like scripting a demo for a client presentation. A test plan is made up out of a series of tests, each of which has some starting requirements and then describes a sequence of interactions with the system as well as the expected outcome at an appropriate level of detail. DRY (Do not Repeat Yourself) test plans are easier to maintain. They define common tasks once upfront and then refer to that information in the rest of the test plan instead of replicating that information (use text macros if you feel like you really must inline). If your test plan is machine readable (i.e.: in some domain specific yet human readable programming language -- see Ruby DSLs), you have a chance of later using the test plan to drive automated tests instead. In this (yet another) way, the manual testing can be a stepping stone towards automation.
use it as an opportunity to learn: members of your talent pool may themselves benefit from playing with the latest new version of the system which they will later be asked to market or support. The raised awareness and ensuing interactions may create valuable opportunities for them to contribute in unexpected ways.
co-ordinate testing so as to avoid unnecessary manual work (that would be expensive and tends to be bad for morale). Test plans help here, too, by breaking up the universe of things to test among the available testers. This division can be done top down or in more collaborative way. Note that who tests what affects the outcome: testers have different expectations and preconceived ideas about how something is supposed to work or about what would be harder for the system to get right. Also, they may gain some useful or valuable insight from testing a certain area of the system, which will force them to understand something better as well as make them use the system the way a new customer might.
try to get help from different people. depending on the level of documentation and the type of system, they do not need to be experts in the field or even be technically inclined. You will want people who are fun to work with (always) but who can also be brutally picky, annoyingly pedantic, mercilessly thorough, who like to think outside of the box, are likely to do the kinds of things no person in their right mind would do, and who get some sort of cruel satisfaction out of breaking stuff. Think your old grammar teacher (but with a wicked sense of humour) on too much coffee gone dark hat.
reward them for finding bugs, but be mindful to tie incentives to finding the kids of problems you need to find (which is tricky because obviously you were not planning to find them). Continued survival can be an incentive, and can be used to motivate a large and diverse population of people (every single employee of the startup plus their family members and friends), but should probably used judiciously and effectively (see test plan and coordination)
after you have exhausted the test plan, refuse to believe there are no more bugs to find. There are always more bugs to find. Make it fun, maybe even competitive, embrace your dark side: breaking stuff can be fun and creative, which is why this stage is sometimes called 'creative testing'.

Manual testing can also tell you what you are missing in your automated testing. It is the Sanity Check to your Quality Assurance. Not surprisingly, it is best done by somebody other than the people who set up your automated testing framework. You might even offer a bug bounty to the people who developed the system under test, because they will have insider knowledge (unavailable to a black box test team) on how to break it. Over the years, I have worked with some truly amazing QA talents who could gently, patiently, and professionally poke holes in the most thoroughly tested software systems. It is hard to put a prize on this type of talent, and it is sometimes found in unexpected places.

Once a problem is found manually, it should be made part of some sort of some automated test suite. If your developer is agile, they will start by writing a failing test that reproduces the manually observed problem before even thinking about how to fix it. If not, they will hopefully add it to the regression test suite later instead of just manually retesting their fix once and hoping for the best.

As with all other types of testing, finding problems is of no value unless there is somebody or something to capture the work product in the form of issues or problem reports. There are many ways of doing this, some more convoluted than others, but you will need something in place that makes the testers feel like theirs was a worthwhile effort, and makes them feel like they succeeded in helping every time they discover something useful. You will want them to spend their time finding problems instead of filing problem reports.

Finally, it is worth noting that both developers and tester being human can create dynamics which just don't exist in automated test scenarios. You might need to shield them from defensive developers whose creations they tortured and whose (creations') shortcomings they proudly document with their problem reports. (Note: it also helps when people writing up problem reports are not overly gleeful and I-told-you-so-ish when writing up their observations).

Testing Ruby on Rails

In the Ruby/Rails world, some of the terms mentioned above have taken on a more specific meaning.

controller testing is generally called functional testing, even though it may include some white box aspects
model testing is used as a synonym for unit testing, probably because models do not require controllers or views
integration testing focusses on the interaction between controllers in addition to models
behavioural testing is what otherwise would be called functional testing

Much of this comes from a Rails culture of following convention, in this case the convention set by default test code generators. This keeps things simple but can be confusing for developers who were using these same terms in other contexts before Rails.

Currently, the important open source tools for rails testing are

cucumber - behavioural testing for story based test driven development
rspec or unit - for building and running MVC test suites
capybara or webrat - for scripting web server/client interactions, with or without cucumber
mocha, chai, and konacha - for testing JavaScript/CoffeeScript as integrated into the asset pipeline

There are many other tools, and as with all things Rails, there is rapid ongoing evolution in this space.

Conclusion

Software testing should not be an afterthought delegated to the least capable developers on the project. It requires good analytical skill and solid foundation of software engineering principles. As with most things, a sense of humour helps, as does knowing what you are trying to achieve, and what price you are willing to pay.

all articles