Donnerstag, 13. November 2014

Code Coverage - Good or Evil?

Probably every serious software project has some kind of code coverage measure in place. Ideally within the Continuous Integration landscape. So you can see what code is executed by the automatic tests.

But why do have this feature? What is the real benefit and are there even downsides or risks? Let's have a closer look.

On the first glance, it sounds quite obvious. One look and you know to what degree your software is tested with automatic tests. The more the better. But is it that easy? But the coverage just states that the code was touched - NOT verified. And there is no indication what kind of test. I think the type of test makes a huge difference, there are e.g. ...

* Integration Tests (High Level) - If some code is executed by an integration test, the code seems to be at least relevant for that feature. But to what degree? Would the test fail if we comment out the code?

* Unit Tests (Low Level) - Since an Unit Test (at least a good one) covers much less code than an Integration Test, there is a good chance that the test really verifies the coding.

* "Coverage Tests" - That is what I call tests without any assertion. For example an Integration Test just "clicking" on buttons or calling a service API. Most times I have seen this on Unit Test level. It sounds like a stupid thing to do, but I think it can happen even without realizing it.

A little example: After fixing a bug, I wondered how this could have happened. I was confident that the functionality is covered by tests. Why was no test failing? I have realized that the assertion was not correct and there was actually no real way that the test could have failed.

That is why I have mixed feelings when it comes to code coverage. What you are really interested in is VERIFIED code not COVERED code.

For me, the most helpful aspect about coverage is the trend over time. It doesn't matter what the absolute numbers are, as long as the trend keeps going up, it is a good sign. Another aspect is that code coverage can be used to verify that your assumptions were right. For example if some code is not covered at all, but you thought it should be executed in some scenario, it might be worth a closer look.

But code coverage can have its downsides as well. Especially when people mix metrics and KPIs. Let's look at the difference:

* A 'metric' is just one dimension you are interested in from INSIDE to verify where you are standing with your project. Like already mentioned, this is especially useful when looking at the trend.

* A KPI (Key Performance Indicator) is used to assess the performance from OUTSIDE. This implies that fact that higher is better. And exactly that can be very misleading when talking about coverage. 85% is not in every case better than 80%. You have to understand where the coverage is coming from.

So from my point of view, coverage is a great and important metric, but a bad and sometimes even dangerous KPI.

Why dangerous?  When seeing coverage as a KPI, it is only a small step to setting a goal from outside.

Overall, code coverage is always very important to me to see where the project is standing in terms of tests. But I also think it is important not to look blindly at it.

In the last time, I get more and more used to see it from another angle. 80% code coverage sounds good in the first moment, but that still means that at least 20% of the codebase can be removed without any single test failing. When looking at it that way, it is even hard sometimes not to loose trust in automatic testing...

Montag, 21. Mai 2012

Validate Your Tests With Jester


What is Jester?

Jester is available  on sourceforge (http://jester.sourceforge.net/) and on the project page, the tool is described as a "JUnit test tester". I think that describes it very precise. Jester basically tests how many changes to the code are discovered by the tests. 


What does Jester do?

What Jester does is rather simple. The tool scans the source code for possible mutations. For any mutation, Jester modifies the source code and runs the tests again. If at least one test fails, everything is fine. After that, the change is reverted and Jester continues scanning.

What is a mutation?
This  can be configured in a way that if a pattern <A> is found, it is substituted with <B>. Some of the mutations pre-configured are:


  • "true" -> "false"
  • "false" -> "true"
  • "if(" -> "if(false && "
  • "if("-> "if(true || "

After Jester is done with scanning for mutations, there are 2 results:

  1. A score between 0 and 100. Example: If there were 100 mutations and 30 were not found with the tests, the core is ... 70!!! Suprising, right? :)
  2. An HTML report showing all mutations that "survived" the tests.


How to use Jester?

Basically, Jester needs 3 things to know in order to do the job:
  1. Where to find the source code?
  2. How to build the project?
  3. How to run the tests?
So, a bit simplified, you just need a directory and 2 ant scripts. In addition, Jester needs Python to be installed if you want the mentioned HTML report. But this is optional.


Why to use Jester?

I find Jester very useful and interesting for 2 reasons.

Validate your coverage

Of course, to measure code coverage is very useful and I think important as well. BUT it is very hard to interpret. The fact that X% of your code is covered with your tests is good to know. No question about that. What does this mean? I believe this alone means NOTHING. It is too simple to write tests that execute a lot of code, but validate only a little. I don't necessarily mean by accident, but unless you do strict TDD, the coverage will always be a bit better than what you really test.

I believe the Jester score can be some kind of hint on valid your code coverage is. So what means the Jester score?

100 - Most likely any change (bug?) introduced to your existing code will be discovered
0 - You probably can do anything with the code, the tests won't break. I like to call such tests "coverage tests". They are good for generating code coverage, but for nothing else.

Of course, the truth will be somewhere between. At least I find it very interesting to find out to which end I am nearer :)  

Find missing tests

The above mentioned HTML report with all the mutations that "survived" the tests is nothing less than a list of missing tests. If "true" can be changed to "false" without any test to fail, there are 2 possible reasons:

  1. A test is missing to test the need for that "true".
  2. That "true" is not needed and can be removed.

Summary

When I first heard about Jester, I was very curious about how it would work within the project I work with. Now that it works fine, I really like the insight I get from it. I really can recommend it and would like to hear other experiences.