Coverage: Don’t Believe The Hype
Dave G. | April 22nd, 2008 | Filed Under: Industry Punditry
More and more I hear people discussing coverage in terms of security testing. I am here to give you some bad news. You will rarely get a genuine answer on how much Coverage you actually received. It is dependent on approach, methodology, tools and skill set.
51% percent of Wikipedia editors agree, the most common forms of coverage testing are:
- Function coverage - Has each function in the program been executed?
- Statement coverage - Has each line of the source code been executed?
- Condition coverage - Has each evaluation point (such as a true/false decision) been executed?
- Path coverage - Has every possible route through a given part of the code been executed?
- Entry/exit coverage - Has every possible call and return of the function been executed?
Each one of these will either result in too little testing or too much testing. What’s worse: it’s unlikely that any two organizations will ever be able to measure this in a way that even allows you to have a conversation about whether or not effective levels of coverage have been obtained.
None of these are going to be effective measures of how well your software has been tested. All are, however, guaranteed to increase the amount of time you spend testing.
The problem with security testing is that the devil really is in the details. And there are enough of them that the traditional QA coverage models mentioned above aren’t really effective.
For security testing, lets add:
- Input Coverage - Has every input (e.g. form field, packet fields) been tested?
- Vuln. Class Coverage - Has every form of vulnerability been tested?
- Threat Based Coverage - Has every threat evaluated?
By combining these two, maybe you have an answer that means something. It is obviously still incomplete. There are still application, network and host state that all impact each of the above tests. Also, there are attacks that specifically relate to state and not inputs.
Now we have arrived at one of the places where the security testing world differs from the QA testing world. For the average application, you can make certain assumptions about the environment it will run in to guide the likelihood of certain states. But in the security testing world, an attacker is actively trying to induce any form of state that will cause an advantage.
So, let me ask the readers of this blog some questions:
- What is an acceptable level of coverage in a security test?
- And if you happen to own security somewhere, what would it take for you to actually find a coverage % credible? I am going to guess the M word will rear its ugly head.
- Do you ever have anything that you can even come close to measuring? There are so many states inside of real world applications, even pen test specific forms of coverage aren’t going to come close to being complete.
- If yes, can you effectively convey that to anyone in a way that will actually give them some level of assurance (the a-word of computer security)?
Caution: I am not actually saying, “Don’t try.” All I’m really saying is, “Measuring this stuff is hard, and the amount of time to do it in a credible way is probably best spent on actually testing more.”


2guesswhat
April 22nd, 2008 10:06 pmIt takes a village of security researchers to measure a system.
Andre Gironda
April 23rd, 2008 1:17 amI don’t like to mix the “code coverage” terminology with the “surface/vuln/threat coverage” terminology.
Code coverage is a unique way of understanding the output of unit and/or functional white-box tests. You can have 10% code coverage and find all of the bugs (Yes, that’s not a typo. 10% — not 100%). You can have 30% or 85%. This all depends on where the conditions and decisions in the code are (sometimes referred to as “meatballs” and “gravy”). Condition-decision coverage is usually the best one to apply to languages such as C/++, C#, and Java from a develop-tester POV.
Often, problems with measuring code coverage with NCSS come from issues such as getters/setters, as well as not measuring “statements” correctly in the first place (e.g. measuring brackets/braces on new lines). I’m sure there are other issues, but I figure I’d bring the more obvious ones up front and center.
In the case of automated fuzz testing with code knowledge (either from EFS/PaiMei or by actually using the source code), line/statement coverage is often enough to go digging for untested inputs. In other words, when you’re looking to get inside the heads of the product release team. In this case, you’re not looking to “get” a certain percentage of code coverage — you’re looking “at” the code coverage statistics to find out where to start/stop testing.
As far as “Vuln. Class Coverage” and “Threat Based Coverage” go… these don’t belong in the same discussion as code coverage. They’re just totally different things. What you refer to as Vuln/Threat coverage is a matter of risk assessment / risk management. The answer to the question, the question that drives us - “What is risk management?” is not always answered in a simple way. Everyone has their own perspective, as well as their own risk tolerance. The only answer is “we don’t have an answer or a clear way of measuring this universally yet”.
I could suggest a few books on the subject of measuring security and evaluation risk, although it sounds like the people that you griping about don’t read the literature in the first place.
Dave G.
April 23rd, 2008 11:49 am@Dre
For security testing, what good is code coverage without saying what you tested the code for? If you can test 10% and shake out all of the bugs, then you can test 90% and shake out none of them
Andre Gironda
April 23rd, 2008 8:16 pmDave G: unit tests that verify input validation on inputs?
Dave G.
April 24th, 2008 1:28 pmand what did you verify them against?
Andre Gironda
April 25th, 2008 5:52 amMethods should respond to invalid input by throwing an exception.
Example one: When testing methods that accept an integer within a specific range, submit an integer outside of that range and then verify that the application throws an ArgumentOutOfRangeException.
Example two: When you test methods that accept input as a string of limited length, submit a string that is too long or does not meet other requirements to be valid.
Example three: If you determine that your application should reject any input containing HTML, test the method by submitting HTML, and fail the test unless the correct exception is detected.
Some methods sanitize input rather than reject it outright. For example, methods that accept Web input which need to encode HTML characters, and methods that submit string input to a database need to parse SQL delimiters. Methods that sanitize malicious input are much more difficult to test, because they don’t simply throw an exception or otherwise return an easily testable condition. In these circumstances, generate a test that checks both the input and the output to verify that the input was successfully sanitized.
Note that this usually works best when done with TDD by developer-testers before SQE’s get their dirty hands on the code. I suggest this to be done in a separate test environment if possible, both in-IDE as well as using a continuous integration server. This seems like more work, but when you integrate regression testing in the same manner, and take into account refactorings (as well as other similar design patterns found in modern development shops) - then it becomes more obvious that operational excellence is the result.
Leave a reply