Many organizations use code coverage tools to measure the effectiveness of regression testing in detecting design errors. While this practice is not devoid of usefulness, there are pitfalls associated with it that one must be careful to avoid in order to derive meaningful information from it.
In my experience, if the development of diagnostics is guided by code coverage rather than an understanding of the design, then only about half of the bugs will be found by increasing the measured coverage from 0% to its maximal value (e.g. 99%). The remainder of the bugs will be found by tests that do not improve the coverage measurements, or by tests outside the regression (in the worst case, by your customers). Experiencing bugs outside of the regression is particularly costly, because it may be difficult or impossible to isolate and diagnose the problem.
So, then, what accounts for these "dark" bugs?
ex1(int a, int b) {
if(!a) {
assert(b);
}
if(!b) {
assert(a);
}
}
Every line and conditional is covered by the cases (0, 1)
and (1, 0), but the case (0, 0) exposes a bug.
While it is a good idea to architect the system such that interdependencies among subsystems are minimized, there usually remain interdependencies that are beyond the comprehension of modern coverage tools. Furthermore, many real systems are poorly architected, which makes the problem even more intractable.
Observability problems can sometimes be addressed by either adding assertions to the design or adding properties to the Verification Specification . There are legacy maintenance issues associated with adding such constraints, so it is often preferable to address observability by improving diagnostics unless the required effort is prohibitive.
ex2(int a, int b) {
int c=a+b;
if(c<a) {
assert(b<0);
}
}
Every line and conditional is covered by the cases (0, 0)
and (0, -1), but the case (INT_MAX, 1) exposes a bug.
In addition to the limitations of specific code coverage metrics, there are a number of additional reasons that one ought not to ascribe too much importance to coverage without addressing other considerations.
One way to combat design modeling issues is to devote some resources to verifying a model that has as much fidelity to the actual product as possible, regardless of its inefficiency in comparison to the usual verification model. This normally occurs shortly before release. Another approach is to use some form of static code analysis, such as lint or code reviews.
One must be careful not to equate coverage with quality. According to testability theory, the final quality, Q, as a function of initial quality, Y, and coverage, c, is given by:
| Q = Y 1-c ~ 1 - (- ln Y) × (1 - c) |
Therefore, the level of initial design quality also plays an important role in determining final design quality.
Another alternative to code coverage as a metric of testing is to insert a statistically meaningful number of realistic bugs to see how many are detected. Unfortunately, automated means of bug insertion tend to be very unrealistic, and manual bug insertion is time consuming and still not necessarily realistic. It is also somewhat dangerous to generate corrupted code, because somebody might mistake it for production code and incorporate it into a release.
| Coverage tools should be used to see what you've missed only after you think you're done developing diagnostics, and not to direct the development of diagnostics. |
Code coverage metrics are never a substitute for an intelligent and thorough verification effort, but they can be used to make good verification even better.
The only true measure of a verification effort is how effectively and rapidly it discovers real bugs. The problem with code coverage metrics is that they are easily subvertable. By imparting a great deal of importance to them, the organization is incentivized to subvert them, and the actual effectiveness of verification suffers accordingly.
Anders Johnson, last modified $Date: 2002/02/05 $