C++: A Cautionary Tale, or, 1 Hour Of Your Black Hat Trip is Spoken For
[Update 7/12]
This post touched a nerve on Reddit. C++ programmers, unsurprisingly, have some issues with this post. Go read the Reddit comments; they’re excellent. And then vote my post up!
1.
Almost 10 years ago, during the dot-com bubble, I did what many security pros before and after me did and “gave up on security” to “go do something meaningful” [∗]. I was at Network Associates (after they bought Secure Networks), and David Meltzer, my alter-ego at ISS, recruited me into a startup along with Danny Dulai and Tim Newsham. We wanted to build the chat system of the future, and we ended up with application-layer multicast streaming media. In 1999. We were a bit ahead of our time [∗∗].
This is good to know because it will help you avoid ever starting a discussion with me about reliable multicast protocols (down with forward error correction!) or source-specific multicast routing (down with source-specific multicast routing!). But why I bring it up is, we wrote it in C++. We wrote a lot of C++. A lot. We used ACE. Used ACE before? You know how much C++ we were swimming in. A lot of C++. Template-y, Boost-y, Alexandrescu-understandingy C++.
2.
Now this is a security blog, and so I’m supposed to be using this time to make a point about security, and that point is this: the notion that C++ is a more secure language than C is a myth. C++ gives you a dynamically-resizeable string class, which makes it less likely that you are going to write the splitvt overflow. But it also gives you a dozen new features which, if you use them wrong, segfault your program.
3.
Take exceptions. C++ has built-in support for exceptions; when something horrible happens, you “throw” a variable that any stack frame up the call chain can “catch”. This is better than returning a cryptic error code, because you can’t forget to check it.
But. When you throw an exception, you effectively “abort” your current function, and all the functions in the call chain up to the point where the exception was caught. If any of these functions aren’t written to anticipate getting preemptively aborted, and hold on to a pointer or a chunk of memory, you’ve got a memory lifecycle bug.
This problem is well known to the C++ community. Herb Sutter wrote a famous article about it, which invented the notion of “exception-safe” C++ programming. Joel Spolsky wrote a JoelOnSoftware about it. There’s a debate about whether C++ exceptions are evil.
But it’s not well known problem to the security community. A year or so ago, Mark Dowd found yet another Sendmail vulnerability. Sendmail is written in C, not C++, but it uses Unix signals and “longjmp” to emulate C++ exceptions, and (it’s Sendmail, after all) isn’t written exception-safe. You can trigger a timeout exception, and Sendmail will retain an invalid pointer into the stack that it would have cleared out if the exception hadn’t occurred. That pointer can be used to scribble over stack frames.
Any time you have a language feature, and you have to think about writing code to be “that-language-feature-safe”, you have a security problem. Because that feature is creating bug classes. Bug classes are bad. Splitvt? That’s a bug. Stack overflows? Bug class. Stack overflows cost the industry over $700MM. Exception-safety problems? Bug class. One that C++ introduces.
4.
Want another example? Destructors. Mark Dowd and John McDonald wrote a blog post about it a few months back. Do you audit code at your job? It’s one of the top #10 most valuable blog posts of the year. Long story short? If you call “delete[]” instead of “delete” —- which is an exceedingly common C++ error —- you’ve introduced a potential vulnerability. Like integer overflows, a bug class.
5.
Here’s another example: the STL. STL (or, more properly, the Standard C++ Library) is the collection of container classes C++ provides. In C, if you want a hash table, you have to implement it yourself. In C++ —- bad example. But if you want a red-black tree with the same API as a hash table, it’s provided for you. Also linked lists, resizeable arrays, and something called a dequeue (prounced “woon”). The STL is one of the great features of C++.
It’s also a reliability disaster. Here’s why: STL containers try to hide pointers from you. Instead of pointers, you get “iterators”, which are objects with a variety of interesting methods and generic functions that operate on them and all sorts of other fancy gunk and at the end of the day it’s all just 1500 lines of C++ template code wrapping: a pointer.
STL tries to avoid the most common pointer bugs, like walking off the
end of an array into bad memory. But some bugs can’t be avoided. So
for instance: if you modify an STL map (which is a red-black tree) [∗∗∗] vector or dequeue,
you invalidate all your outstanding iterators. Modifying a
red-black
tree container potentially repositions the nodes they pointed to, and all that
fancy OOP gunk aside, iterators are just pointers, not magic. If you
hold references to those invalid iterators, they now point to invalid
addresses.
This is an absurdly common problem. I’m an OK developer, if I do say so myself, but Danny Dulai, Kneel Fachan, and Tim Newsham are just fucking insanely talented developers and they ran into these problems, just like me.
And again, this problem is well known in C++-world (there’s a whole very excellent book about it). But it’s not well known in the security community. And when you screw it up, it’s potentially exploitable.
6.
C++ gives you a resizeable string, so you won’t write splitvt. But in 2007, code vulnerabilities don’t look like splitvt anymore, ever. We’ve moved on, through off-by-one errors into integer overflows and now uninitialized variables. On balance, the bug classes C++ introduces are way scarier than the ones it takes off the table.
So, to kick off our series of posts about which Black Hat talks you should be going to this year, I’m going to recommend this one. Mark Dowd and John McDonald, on stage, talking about the ways C++ screws software security that you hadn’t thought of before. “Recommend” is an understatement. If you get paid to find vulnerabilities in code, this is the most valuable talk at the conference this year.
See you there!
[∗] For the most recent example of this phenomenon, see Dug Song.
[∗∗] This is marketing-speak for “wrong”; you can say the same thing for a batter’s swing when he takes a strike.
[∗∗∗] Deleting map nodes invalidates some iterators in a map, but adding nodes doesn’t. Breathing on a dequeue invalidates iterators.
31 Comments so far
Leave a reply
From the SGI STL documentation:
“Map has the important property that inserting a new element into a map does not invalidate iterators that point to existing elements.”
So you might want to rethink point 4. Your claim, however, is definitely true for vectors.
Point 5, of course…
Hmm, it seems more like a post about how people without a good grasp of C++ can create bugs - the main ’security’ problem here is that you should never, ever hire incompetents to do serious projects …
Stefan, I’m spacing on the map iterator invalidation rules, but you’re obviously right. There is some goofiness to map iterators; I’m recalling going through contortions to delete nodes from a map in a loop (for instance, removing events keyed by IP address by timestamp staleness).
A Bad Craftsman always blames the Tool.
cucu: two problems with that argument, which is old as the hills:
1. The bugs we’re talking about are so easy to create that the “best” minds in C++ had to write articles about how the features they involve should be removed from the language because they were so easy to misuse.
2. You can say the same thing about any exploitable C code bug; my premise is that C++ is less safe than C, and you’re not addressing that.
You don’t get away from these types of problems in any language. You can write software that runs in a JVM or in a CLR and still have vulnerabilities. You can sometimes even find vulnerabilities in the underlying language that the JVM and CLR were written in (which is most likely going to be a mixture of C++ and assembly).
C++ is less safe than C? Back in the DOS days, I hosed my entire partition table by accidentally writing to an invalid FILE pointer. I can do this in C++ too if I want to. So which one is better?
You still have buffer overrun problems in C. You still have memory leak problems in C. You still have security problems in C. You still have buggy code in C.
What you don’t have in C is the ambiguous meaning of operators. The -> operator in C means precisely that you are accessing a member from a pointer. In C++, it doesn’t always mean that. If you’re a C++ programmer with experience, it isn’t a problem. If you’re a C++ programmer without a lot of experience, well then yes it is a problem. C++ programmers have to be aware of what they are doing or else they run into the problems your article describes. Isn’t this the same with any implementation regardless of the technology behind it?
This, of course, leads into the argument of maintainability of C versus C++ code bases. Limiting your implementation options to C based on an ambiguous idea of it being more secure isn’t going to mean squat if you find a security hole that requires you to rewrite the whole program to fix.
A bad tool routinely screws the craftsman.
Eng, I think I agree with you that all the extra machinery in the C++ language (for instance, operator overloading, or polymorphism, or whatever) provide opportunities for exploitable errors.
One thing to keep in mind is that aside from new bug classes, C++ definitely provies a myriad of new exploit toeholds. Everything in C++ is indirected; there are zillions of function pointers at predictable offsets.
It should be very much straightforward to avoid memory lifecycle problems when dealing with abrupt stack unwinding due to thrown exceptions. The general idea is to create objects on the stack which hold and manage pointer references. When the exception is thrown, the destructors of objects on the stack always get executed! Have you heard of auto_ptr for instance? I think I wouldn’t hire a C++ programmer who has such difficulties with these concepts.
Actually, I kinda like Joel’s articles, but I read the linked one about exceptions, and I think he is in the wrong this time. I personally in favor of performance-oriented code, and I generally don’t use exceptions, but they have clearly a place and use which is not insignifficant or buggy.
Cutting and pasting from a reddit comment of mine:
Yes, RAII addresses much of the problem with exceptions. SafeSTL addresses the problem of insecure iterator invalidation (at a cost). Vector addresses the problem of array destructors. Good programming addresses the problem of bad programming.
The problem is, as a security practitioner paid to search through the codebases of the largest, best-known C++ products, I keep finding bad stuff. Knowing the right thing isn’t enough. A CS freshman knows not to stuff a 200 character string in a 100 character stack array, but that vulnerability still happens.
Many of these difficulties of C++ are fixed in the D programming language, which is a reengineering of C++.
How about TCL?
Some aspects of STL add lots of complexity (and potential bugs) for very little benefit for the programmer. That’s why we had to reinvent the wheel - again! - and write our own classes. Yes, I wrote another String, List, Iterator. We had only one DoS vulnerability that was related to code logic.
I admit being a Qt fan. These days its non-GUI classes provide a pretty good choice. If its GPL license were available sooner it would have save a bunch people hours of debugging.
My .02 -
my problem has always been crazy deadline pressure from manangement / marketing makes it all that much easier to miss something obviously problematic. In which case, I am more concerned about making the deadline than I am about security concerns.
My point is - it’s not always enough to be a “good programmer” and be aware of the various exploit classes if you aren’t afforded the time necessary to do the job right.
The D programming language? Can I be devil’s advocate and propose that D is exactly the wrong approach?
D’s memory model resembles “managed C++”. But it still provides direct hardware access, along with malloc and free. Programs written directly to hardware inherit memory lifecycle vulnerabilities.
What troubles me is when languages go through effort to paper over that problem. C doesn’t. If you understand C, you know that fundamentally you’re working with raw memory. C++ tricks you into believing maybe you’re not, until you delete something out of a container and one of your iterators turns into a wild pointer. Doesn’t D do the same thing?
I think the issue here is, lightweight/static runtime versus heavyweight/dynamic runtime.
In D, the normal practice is to use the garbage collected heap, so wild pointers into memory are not going to happen.
It does allow you to call malloc directly, and such, but while in C these are necessary, in D they are very rarely necessary, and so it is much less work to audit the few cases where they do occur.
The idea is to make the easy, straightforward way to code things also the safe, efficient way. To do things the unsafe way, you have to do extra work.
For another example, variables and fields are always initialized. To leave them uninitialized requires you to do specific extra work. In contrast, C/C++ by default leave them uninitialized, which is a far more error prone situation.
I’d recommend giving Ada a look-see for large, collaborative projects. My personal experience is that switching from C/C++ to Ada was like going from a street-tuner Honda to a Roll Royce.
shee, hi Tom…
i’ve not “given up on security” to “go do something meaningful” - this is crazy talk. that may have been how you felt about NAI, but i’m glad to have had the opportunity to save the internet with you at Arbor.
Stuart Staniford may be a better example for your purposes:
http://www.econbrowser.com/archives/2007/05/northern_ghawar.html
i’m sorry i’ll miss horizon and duke’s talk, which promises to be amazing (and never were there two nicer guys in the industry). i keep hoping someone takes on Python next, and a Li Gong finally emerges in that community.
see you at WOOT07!
Try and take a look at Java’s fail fast iterators. It’s pretty easy to do the same thing for invalid C++ iterators.
Tom, great write up, point 6 and ref link to best book on market for basic pen/vaap. And thank you for the stl ref book. I had only seen Secord’s till this point. Looking forward to Hat and Woot.
Just days away, I think Elvis is being queued up somewhere….
Best, Hal
C — A Cautionary Tale, or, 1 Hour Of Your Black Hat Trip is Spoken For
Thomas Ptacek July 12th, 2007
What!? Programmers are still managing their own memory!? What time is it? (2007!)
[ptacek] “On balance, the bug classes C++ introduces are way scarier than the ones it takes off the table.” We’ve gone through all the low hanging fruit. I think lots of buffer overflows are scarier than a few hard to understand and hard to exploit flaws.
[kumar] “A Bad Craftsman always blames the Tool.” A good craftsman also knows the limitation of his tools, wishes for (and builds) better tools, and groans when he realizes how many bad craftsmen are using inappropriate tools. I don’t think one or two bad craftsmen are the problem here. We’ve got a pandemic.
Tim, confirm for me that you really believe that the balance of trivial overflows remaining in mainstream code is scarier than the balance of C++ flaws (say, iterator invalidation, smart pointer aliasing, and exception safety) in mainstream code.
You’ll probably mostly be telling me something about how effective you think auditing is (ie, have we squeezed most of the silly stuff out yet).
Quick couple of notes…
Note one…
Sure C++ has new things that can be screwed up and that can have potential security implications, but it’s true for anything new. This applies not only to languages, but also to applications. Let’s say a solid product A just introduced a new feature. There’s a good chance that this new feature might have a vulnerability or two
Note two…
Have you heard about hash maps… They are now in STL, so you can have your hash tables
I wanted to clarify what I was trying to say…
Thomas is right on the money saying, “C++ is less secure than C”. It just seems like it’s stating the obvious. Besides, C++ wasn’t designed to be more secure. It was designed to be easier for things like object oriented design while still retaining the performance potential of C. As for STL. STL is not a part of the C++ language itself, so it shouldn’t be thrown in the mix.
As for people who say that C++ is more secure than C they:
a. Know nothing about security and
b. Know nothing about C++
[ptacek] “confirm for me that you really believe that the balance of trivial overflows remaining in mainstream code is scarier than the balance of C++ flaws.” Of the C++ code I read (not a lot these days), I don’t really see heavy use of C++-isms (biased sample, mind you). I still do see buffer overflow and integer overflows often. I can’t speak at all to “the balance.”
I don’t know that we’ve “squeezed all of the silly stuff out”… there’s a lot of old code that’s been cleaned up a bit, but there’s still plenty of new code being written and not everybody is statically analyzing their code or auditing it. Then you have a lot of people who just plain arent using C++ or C anymore. Good for them, I say. (Nothing against C, I just don’t think its the right tool for most people. Programming speed, correctness and code size are so much more important than cpu speed and memory and disk footprint these days. And next year it will be 1.5x more important).
You’re really a believer in static analysis now?
Depends on what you mean by ‘believer’. Does static analysis replace an auditor? Hell no! (disclaimer: I’m an auditor). Does it point developers in the right direction to finding and fixing security bugs? Yup. People who use static analyzers have much more secure code (there’s my bold, unbacked assertion for 2007, you heard it first here!). In fairness, some of the correlation is due to them caring about security in the first place (people who use static analyzers also audit code and run fuzz testers).
Tools help us out. Writing and auditing code is hard. Static analyzers show us some bug potentials deep in the code that we might not have considered, especially if there is a lot of distance between the source of the data and the code that incorrectly handles it. Similarly fuzzers show us some bugs that are fairly shallow in the code that we might not have considered simply because the corner cases are slight and easily missed. Both of these tools miss lots of issues that a programmer who understands the code well might notice.
[…] other point about C++ and constructors throwing exceptions, I refer him to Cargill, or back to our blog, noting that exceptions are themselves inherently dangerous. “When a language feature […]
A Good Craftsman Always Chooses The Right Tool.
C++ Is Not It.
I’d love if you could write a post about Python and security. I’m subscribed to your blog so I won’t miss it.