Introduced: A resolution resolving the semantic quarrel over malloc checking.

This is all my fault.

Many moons ago, I wrote a blog post chiding developers for checking the return value of malloc, the C function that allocates chunks of memory for programs to work with. When malloc fails, it returns NULL. According to Hoyle, you’re meant to check for that value, because malloc can fail at absolutely any time (you are not the only program claiming memory).

I stand by that argument, and by most of the wording of that blog post. Now about that word “most”.

Dave LeBlanc and I go back, though he may not remember that. Last bubble, we were dev leads on competing products. We’ve taken different career paths, and, long story short, he’s technically now more of an authority on secure coding than I am. And I’m telling you this because LeBlanc’s response to my last post is —- faithful paraphrase —- “are you high?”

LeBlanc thinks you’d have to be not check malloc returns, because:

  • not checking will inevitably crash the program, and crashes are bad,

  • not checking leads to the bug class Dowd found, and

  • not checking leads to the bug class Dowd found.

LeBlanc is right. I am wrong. “Not checking” is bad. Let me make a very slight semantic adjustment, so that I might be inassailably correct (again).

Here are three (extremely contrived) code examples. The first, let’s call, “unchecked”. It simply doesn’t check the return of malloc.

#define hostile /**/

void *
_setup(unsigned hostile slot, unsigned hostile id) {
        u_int32_t *slots = malloc(SLOTS_SIZE);

        slots[slot] = id; // XXX write32 corruption

        return slots;
}

The second, we’ll call “caller-checks”. As you can see, it does.

#define hostile /**/

void *
_setup(unsigned hostile slot, unsigned hostile id) {
        u_int32_t *slots = NULL;

        if((slots = malloc(SLOTS_SIZE)))
                slots[slot] = id;

        return slots;
}

Now the third, which looks suspiciously like the first, we’ll “callee-checks”.

#define hostile /**/

void *
_setup(unsigned hostile slot, unsigned hostile id) {
        u_int32_t *slots = malloc(SLOTS_SIZE);

        // NOT REACHED ON FAILURE

        slots[slot] = id;

        return slots;
}

What’s the difference between the first and the third? In the third, if malloc fails, it does not return NULL. It instead hands the program off to a recovery regime, which, by default, safely and immediately ends the program.

What’s the difference between caller-checks and callee-checks?

First, callee-checks is safer. You can’t screw it up. The worst you can do is write a program that will abruptly terminate. This is far better than the current worst-case scenario, in which manifestly common programmer errors allow Mark Dowd to upload malicious code into your program.

Second, callee-checks is cleaner. In the caller-checks case, not only does “setup” need to check, but so does “setup“‘s caller, and it’s caller’s caller, and it’s caller’s caller’s caller, all the way down to the place where your program inevitably gives up and terminates the program.

“But, Thomas”, you say, “not all programs do give up and abort. Some have policies for handling out-of-memory conditions”. And so they do. And in most cases, those policies are global, and can simply be substituted for the default behavior of exiting the program.

But I will grant you that in many cases, you have a genuinely useful recovery regime that is specific to one code-path —- say, an arena-style allocation regime for a particular user request —- and no global policy will help. So, I submit to you a fourth option, which I will not name, and which looks suspiciously like example (2):

#define hostile /**/

void *
_setup(unsigned hostile slot, unsigned hostile id) {
        u_int32_t *slots = NULL;

        if((slots = unsafe_malloc(SLOTS_SIZE)))
                slots[slot] = id;

        return slots;
}

Did you see the difference? It’s subtle. But it’s also easy to grep for and easy to check.

I am an advocate for checking malloc —- callee-checks style. It is simply harder to screw up, and, in the overwhelming majority of cases, which you can check for yourself by randomly sampling Google Code Search, it costs you nothing in terms of reliability of functionality. Stop caller-checking malloc.

To LeBlanc’s other point about C++ and constructors throwing exceptions, I refer him to Cargill, or back to our blog, noting that exceptions are themselves inherently dangerous. “When a language feature requires you to be that-language-feature-safe”, I believe I said, “you have a security problem”.

As for his specific example: you can’t blindly throw exceptions from a ctor. Even Meyers (MECv1, Item 9) catches that one. DON’T DATE ROBOTS!

54 Comments so far

  1. jf April 18th, 2008 1:34 am

    Some of this is a bit of an over-simplistic view. The first and most valid reason that you should be checking your return value is that you don’t know that NULL doesn’t point to a valid address, period. The second aspect of that is that _a lot_ of bugs occur because of either missing or improper return value checks. You run into a lot of uninitialized variable issues directly as a result of this.

    I think you probably meant to type ‘xmalloc()’ or something in the third example, to imply a different function. I don’t think it really matters how you check the return value so long as its checked, arguing about the methodology employed is somewhat an exercise in needless pedantry, or so it seems imho.

  2. jf April 18th, 2008 1:37 am

    (Also, for the record, I believe Ilja talked about exploiting null ptr dereferences in this manner (0+uint32_t) a year or two back.)

  3. Thomas Ptacek April 18th, 2008 2:43 am

    jf: *No*.

    First:

    xmalloc is *the exact opposite* of what I’m advocating. Saltzer & Schroeder, principal (b), “Fail Safe Defaults”. The exception you make for your codebase should be for the unsafe behavior of “malloc”, which says “go ahead and continue executing after failure, just try to be careful about it”.

    Second:

    Inconsistent malloc recovery regimes are one of the most common problems in C codebases (even those that “check” malloc). The Flash bug happened because of an integer overflow. But you can induce malloc failures easily without controlling the size input.

    The choice of caller versus callee checking is not pedantic: one fails safe, the other doesn’t.

  4. Thomas Ptacek April 18th, 2008 2:44 am

    And, Ilja is pretty awesome.

  5. ivan April 18th, 2008 2:52 am

    I vote for checking the return of malloc() as the safer bet. “safely and immediately end the program” may end up to be not immediately or nor safely when the conditions underneath the application change unbeknown to the programmer.
    Also, exploitation of NULL pointer dereferences is not constrained to direct remote code execution scenarios only, they may be used for several other interesting attacks including memory leaks, oracles, forced throwing of exceptions/signals, etc.
    @jf: As far as I know exploitation of null pointer deref was first discussed in an 8lgm advisory for SCO’s pt_chmod back in 1994

  6. Thomas Ptacek April 18th, 2008 3:00 am

    Explain how “abort()” is less safe than “if((ptr = malloc())) abort()”, or concede the argument. Then, show me a case where an application protected itself against memory corruption by *staying alive instead of aborting* when a runtime error occurred.

    The 8lgm thing is great, but it’s not the same bug class; if NULL is in your address map, there’s nothing special about it. You may as well not even call it “NULL”; just call it zero.

  7. Bwooce April 18th, 2008 6:24 am

    Nice. I agree with the concept. Running out of any of the following resources is somewhat fatal individual processes on todays machine (un*x esp):
    1. Memory
    2. Disk (system disks at least)
    3. File descriptors
    4. Filesystem related limits

    It would be interesting to explore the other areas, even if just from a “is this really worth doing anything about” point of view.

    Recently the maintainer/author of the yaws webserver came to the conclusion that
    Started to use the new ssl:transport_accept() function, when accept fails, We now fail yaws entirely and it needs to be restarted by its supervisor or heart. If we have filedescriptor leaks, even outside of yaws, there is no good thing to do when accept fails. (klacke)

    Erlang has a “let it fail” model, where the process will be restarted by it’s supervisor, or the restarting process’ supervisor will die and it’s parent will attempt it (so on, and so forth, until the OS level heartbeat program).

  8. Bwooce April 18th, 2008 6:31 am

    Oh, and the Erlang VM does exactly what you recommend. I think it is an easier choice for a VM; there can be no special cases unless there is some way of differentiating types of memory allocation. There shouldn’t be (and isn’t). So VM’s should just die.

    I can draw a distinct parallel to the OS level though. You’ve got me thinking of the Linux OOM Killer as a security vuln. now. Not that I’ve ever liked it - I’d prefer a rebooting machine to a crippled one.

    What does the JVM do? I guess it has that inbuilt limit already to avoid having to make a decision, but I’ve never tried overloading it.

  9. tb April 18th, 2008 11:10 am

    Example (1) and (3) are the same - except comments.

    (1)
    #define hostile /**/
    void * _setup(unsigned hostile slot, unsigned hostile id) {
    u_int32_t *slots = malloc(SLOTS_SIZE);
    slots[slot] = id; // XXX write32 corruption
    return slots;
    }

    (3)
    #define hostile /**/
    void * _setup(unsigned hostile slot, unsigned hostile id) {
    u_int32_t *slots = malloc(SLOTS_SIZE);
    // NOT REACHED
    slots[slot] = id;
    return slots;
    }

    So, I’m having some trouble following your reasons for why example 3 is the best choice.

  10. Dan McDonald April 18th, 2008 11:16 am

    If your OS offers an auto-restarting facility (like SMF on
    OpenSolaris - http://www.opensolaris.org/os/community/smf/)
    doing the fail-on-malloc(), even for system services, is probably acceptable and maybe even preferable.

    Your big point, and I think you need to shout it louder,
    is that you need a consistent and well-documented malloc-fail
    regimen for a given app/project. You may even have different
    behaviors for different parts, but they need to be documented.

  11. Thomas Ptacek April 18th, 2008 11:24 am

    @tb — yes. They are the same. This is a case where a security problem can be fixed with no loss of functionality or performance simply by changing the default semantic of a low-level library function.

    Lots of projects address this problem with a “xalloc” function. This is retarded. Projects should be going out of their way to get the *unsafe* behavior, not the safe one.

  12. Leman April 18th, 2008 11:33 am

    Hi,
    Please I would like to know what does “//NOT REACHED” mean in the “callee-checks” model.
    Thanks.

  13. jf April 18th, 2008 11:37 am

    re: xmalloc()

    How is // NOT REACHED not reached? thats what made me assume xmalloc(), are you saying that somewhere in there should be code that checks and performs some action?

  14. Thomas Ptacek April 18th, 2008 11:53 am

    jf: because in the third example, malloc itself has been modified to abort instead of returning NULL.

  15. brl April 18th, 2008 12:09 pm

    I stopped checking for malloc() failure in my C code since Tom’s original post and I have never looked back.

    If you can hit mapped memory from a NULL pointer then you already have problems bigger than xmalloc() can save you from.

  16. tb April 18th, 2008 1:09 pm

    So what you suggest is to use _set_new_mode and _set_new_handler to handle gracefully failed allocations (OOM in most cases) rather than writing a bunch of crap to code around the possibility of a NULL pointer.

    Is that correct?

  17. lol April 18th, 2008 1:10 pm

    “If you can hit mapped memory from a NULL pointer then you already have problems bigger than xmalloc() can save you from.”

    apparently addition/offsets confuse you

    learn2read

  18. Thomas Hurst April 18th, 2008 1:11 pm

    On BSD’s, any of:

    const char *_malloc_options = “X”;

    ln -s X /etc/malloc.conf

    setenv MALLOC_OPTIONS X

    Will cause malloc to print a diagnostic and abort() instead of returning NULL.

  19. Thomas Ptacek April 18th, 2008 1:22 pm

    @lol — you first. Read Ivan’s comment about pt_chmod.

  20. Thomas Ptacek April 18th, 2008 1:23 pm

    @Thomas — there’s no way to opt out of that regime; when malloc debugging is on, there’s no “unsafe_malloc” you can call that will fail with a checkable error.

  21. pogrady April 18th, 2008 1:35 pm

    I think that there’s another good reason to check your malloc return codes: because when a program crashes, what you need to know is the root cause. That root cause is /always/ associated with the first thing went wrong. If my code mallocs something, then uses that pointer, and then crashes due to a GPF, it still is up to me to figure out what of several variables could have been the culprit. If x[y]=0 crashes, what was it? The pointer? The array index? But since I know that returning 0 from malloc indicates a special condition–usually not a very desirable condition–I’ve found that it’s always best to flag that problem immediately. Now, when my system crashes I’m as close to the actual cause as possible. Checking for those problematic values as early as possible has saved me many plane tickets overseas…

    Its best to discourage this practice of value overloading (something we frequently see in return codes). This is the idea behind the xmalloc() approach. How often have we seen 0 meaning OK in one context (e.g. fseek); and 0 meaning not OK in another (e.g. fopen)? A better architected system separates these into distinct values… and allows me to extend the ways I can tell my clients how there call has failed.

  22. brl April 18th, 2008 1:43 pm

    @learn2read

    I’m not confused at all by untrusted pointer offsets. That was exactly my point. If an attacker can reach mapped process memory by indexing a NULL pointer then you already almost certainly have a security bug that exists whether you are checking for malloc() failure or not.

    Any program that makes large allocations based on untrusted data needs to enforce an allocation policy to prevent an attacker from being able to arbitrarily consume memory. You can’t do that by calling malloc() directly.

  23. brl April 18th, 2008 2:41 pm

    @Thomas Ptacek

    “Read Ivan’s comment about pt_chmod.”

    I think you meant to direct this comment to me.

    The 8LGM bug has nothing to do with malloc() failure, and for that matter Dowd’s flash exploit only *incidentally* involves dereferencing a NULL pointer from a failed allocation.

    The ActionScript VM would be just as vulnerable to arbitrary memory corruption if the situation had resulted in a successful allocation instead of a failed allocation.

    It would be harder to forge the exact absolute address to write to (ie: indexing from a heap address, instead of 0) but in some cases that matters, and in some cases it doesn’t.

  24. Thomas Ptacek April 18th, 2008 2:46 pm

    @pogrady — your runtime is better at tracing the affected call site than your human programmers.

  25. ivan April 18th, 2008 4:32 pm

    @tom: read again my post. I said, aborting on the first malloc failure can have security implications other than direct code execution. let me think ermm like, during crypto operations for example “epta pam openssh pkcs mp rsa asn1″

    Also to consider is what happens to all those functions that _indirectly_ end up using malloc() and whose semantics you just decided to change with one single stroke?
    @brl; correct, i did not imply that 8lgm’s pt_chmod had anything to do with malloc() simply that it is the first public reference that I know of concerning exploitation of null pointer deref (it was a comment to jf’s comment). Incidentally, the problem was due to that fact the a pointer was not checked to be != NULL before using it as an argument passed to a security sensitive function.
    You may have other serious problems if your code ends up running on a system or under a condition in with page 0 is mapped but you can’t expect to know exactly all environments on which your code will end up running so i’d rather check for malloc() failures than not)

    now, I wont go into more details or discussion but consider this one:
    {
    p1=malloc(secret_size);
    p2=malloc(user_suplied_size);
    p2=malloc(secret_size);
    }
    do you still want malloc() to abort() on the first failure condition?

  26. ivan April 18th, 2008 4:33 pm

    ops, the last variable name is a type, say its “p3″

  27. Bwooce April 18th, 2008 7:46 pm

    @pogrady: So whatchagonnado when you run out of memory? You’ve just entered a funny world where lots of library functions don’t work any more.

    log the error? Oops sprintf wants to malloc. Logging expansion probably wants malloc too. So if you can’t log it, what use is it?

    I would suggest that system statistics are more useful in these cases; you can see that the system ran low on memory and go from there.

  28. Rosyna April 18th, 2008 9:11 pm

    abort() on all malloc() failures? I don’t think allocating 1gig of space and one page of space should have the same consequences. After some large allocation, malloc() needs to be checked.

    After all, you did call MoreMasterPointers() first (in main()) in your application to allocate some memory to free in case you run out of address space, correct?

  29. as April 18th, 2008 9:36 pm

    @brl

    If the index is over 4096, it might reach the program text whether or not NULL is mapped. This may or may not be useful.

    Also I don’t see a reason why logging OOM failure would fail, since most failing mallocs I see are from bugs where it tries to allocate 2+GB.

  30. Thomas Ptacek April 20th, 2008 2:08 pm

    Ivan — sorry, i responded to your comment on a previous post, instead of here, where you posted it. Long story short: you just laid out a textbook case for why there should be an “unsafe_malloc”; you did not address my argument that the default should, however, be fail-safe.

  31. Thomas Ptacek April 20th, 2008 2:09 pm

    @Rosyna there is little dependable correlation between request sizes and malloc results in a program running on a system under attack.

  32. ivan April 20th, 2008 7:40 pm

    @tom: ah! you’re the master of blog controversy and to that I can concede. So in the spirit of helping you generate more traffic to your blogpost I will answer why defaulting malloc() behavior to abort() on failure is a bad idea:
    Because you unilaterally change the semantics not only of malloc() but every other function that uses it, but… hmm wait a minute, didn’t I write that already?
    So now, after your proposal for the new malloc() behavior is implemented, we will all need to expect our programs to abort() on calls to who knows how many different functions that use malloc() internally, like strdup() for example. Off the top of my head I can envision every RPC-based services aborting unexpectedly, non-sanitized memory being left all around, garbage collectors in VMs not being able to do last-resort memory recovery and IPCs that using ref-counted shared objects or spinlocks go medieval on your box.

  33. ivan April 21st, 2008 12:38 am

    oh and btw, how should tqbf_super_duper_extra_safe_malloc() work when its called from realloc() or with size==0 , i.e. malloc(0)
    I suspect that in the later case it should just return NULL instead of abort()?

  34. Thomas Ptacek April 21st, 2008 1:36 am

    A failed allocation is a failed allocation, so, if realloc fails, the fail-safe default is to abort.

    The malloc(0) argument is a good one, if apocryphal.

    The argument that has made me think the most so far is that it will break existing code. I’m skeptical, given how terrible most software is at handling allocation failure. But still.

  35. David LeBlanc April 21st, 2008 6:54 pm

    Wow - that’s a blast from the past. Now that you remind me, I do recall that exchange. Different times. Wow, was I a hothead back then. I may have more comments further along, but something I found interesting was you said:

    ————————–
    POSIX Enabled
    OS/2 Subsystem Enabled

    Note that there are no published security problems with either of them…
    but who needs to do research to invent new vulnerabilities? No! We’ll just
    claim that they’re insecure because “Windows NT was C2 evaluated with the
    XXX subsystem disabled”.
    ———————–

    In reality, there were a lot of problems with both of these subsystems. The OS/2 subsystem was removed quite a while ago, and the POSIX subsystem caused a number of problems for some time, until it was made case-insensitive by default. I knew of some of these issues at the time, and my recommendation was that if you did not need those subsystems, disable them. I was about 2-3 years ahead on that point.

    As to the rest of the argument, from my POV, we were just having a reasonable disagreement because we were approaching the problem differently. You had a different design goal than I did. I certainly never thought poorly of you (definitely not on a professional level) over this disagreement (though having my integrity attacked didn’t make me a happy camper).

    I will tell you, now that both of our apps have ended up in the dustbin of things that used to be good, that SNI was eating our lunch on UNIX checks and Web checks. I personally didn’t have much control over fixing that problem. I could pour on the juice with respect to the Windows NT checks even if the rest of the “X-Fa^Horce” was twiddling its collective thumbs, and with data management issues, which I did. It did win us a couple more competitive reviews until Ballista won one later that same year.

    Interestingly, your comments around pen testing vs. just running a scanner, showed up in a later iteration where I made a feature such that the scanner would get deeper into the network with every successive run. Pity that ISS just let it die after I left for Microsoft (the feature and the app).

    Something that’s a bit comical about that thread looking back is that I came here and worked as an internal pen tester for about 4 years before moving back into product development.

    Comments re allocs later -

  36. Thomas Ptacek April 21st, 2008 6:59 pm

    Ok, two things:

    (1) I sound like a 12 year old in that thread, so I hope you took it as self deprecation on my part.

    (2) No fair going back and evaluating the OS/2 and POSIX subsystems based on what we know now. That same logic would flag all of Win32 as a vulnerability in ‘97. =)

    All of Secure Networks is on the dustbin of history, but the X-Force is (obviously) still thriving. So there’s that.

  37. David LeBlanc April 21st, 2008 9:03 pm

    I will confess to grabbing part of the C2 checklist and glomming it into the scanner just to get more checks. Management and marketing were really concerned about numbers. Some of the checks were pretty bogus - like the dataflood (not MY idea). I finally got that one removed.

    However, I also had some pretty good connections here at the time, and we had some interesting customers who asked for some very specific checks. I did know at the time that the subsystems were a serious hazard, and a certain customer thought that was really valuable, but I couldn’t tell you that at the time, especially not in public, and considering that you were working for an unexpectedly strong competitor.

    That was always something that was tricky to finesse - there were some things where I wasn’t free to go on just how much of a menace something was, but just hey - you _might_ want to turn this off… Looking for bad DCOM permissions was another one of these, and I seem to recall this being a popular area of attack a few years later.

    Other checks that may have seemed bogus actually had some real value. For example, you could look for wacky privileges. As it turned out, systems with weird privileges enabled were nearly always owned and the privs were a side-effect of the tools popular at the time. Privs can also be an interesting way to back-door a system. And there were a lot of them, so it made the check-counting PHB’s happy, too 8-)

  38. Chris Pepper April 21st, 2008 11:40 pm

    Thomas,

    Pardon me if I’m missing your point, but the difference between checking checking as a caller vs. as a callee, seems to be better known as “Trust no one”, which seems like a good model in many cases, especially in defensive programming.

    If I’m understanding correctly (not a given), I think the more common nomenclature would help make your point clearer.

  39. Thomas Ptacek April 21st, 2008 11:48 pm

    If you’re arguing that, whether malloc has been modified or not, it’s still good defensive programming to check its return value, I have a hard time refuting that on anything but aesthetic grounds. It is the case that those malloc guards add a lot of clutter.

    I agree, the nomenclature I’ve chosen have definitely not advanced my argument. =)

  40. incredible April 22nd, 2008 12:20 pm

    comon… you dont need 20 example to justify
    ugly sementics.

    Be sharp, check and bail!

    my_type *my_var = NULL;

    if ( (my_var =(void *)funct(64))) == NULL)
    {
    /* XXX */
    return 1;

    }

    void * funct(ssize_t i_sz)
    {
    void *r_ptr = NULL;

    if(i_sz == 0)
    {
    /* XXX */
    return NULL;
    }

    if( (r_ptr = malloc(i_sz)) == NULL)
    {
    /* XXX */
    return NULL;
    }

    return r_ptr;

    }

  41. David LeBlanc April 22nd, 2008 3:19 pm

    One of the comments above had a flaw in the argument - what are you going to do if you’re out of RAM? This presumes that all your allocs are small. You may well have the condition of:

    p1 = malloc(small_fixed);
    p2 = malloc(user_provided);

    The second could reasonably fail, _especially on 64-bit, and you’re not out of RAM - you just don’t happen to have 2GB available. Having your app abort under these conditions is horrible user experience, unless it is a fairly trivial app. If the app was doing anything else of value, aborting is wrong.

    BTW, if you don’t like the clutter, there are ways to deal with this. For example, one could make a macro along the lines of:

    CheckAlloc(), which maps to:

    if(!(expression))
    {
    err = OUT_OF_MEMORY;
    goto CleanUp;
    }

    you then provide a CleanUp section. I don’t like this construct when overly used, but it solves the clutter problem. YMMV.

  42. Thomas Ptacek April 22nd, 2008 3:34 pm

    Dave, the only problem I have with malloc wrappers is that they implicitly say the exception case is the one that’s checked, and the default case leaves it up to the user. It should be the opposite: malloc should gracefully terminate the program, and CheckAlloc should allow the caller to check NULL.

  43. Thomas Ptacek April 22nd, 2008 3:34 pm

    @incredible — also, nobody should ever use a signed integer comparison on input later process as unsigned. Bad programming? Use good programming. Simple!

  44. incredible April 22nd, 2008 3:49 pm

    TP:
    typo s/ssize_t/size_t , but again you get the idea, contextual wise behavior on NULL comparaison should
    be dictated on untrusted input/output mechanism and
    every call/wrap you do, should be evaluated,

    Everyone:
    ultimately if your program request an operation that need memory either you prune some old data to get back some memory and retry or abort or fail on an error but thats a design decision not something you should assume in your allocation mechanism.

  45. Thomas Ptacek April 22nd, 2008 3:53 pm

    Except in uncommon cases, there shouldn’t be a NULL comparison. The program should panic before it returns NULL to a malloc caller.

  46. incredible April 22nd, 2008 4:03 pm

    I wouldn’t aggree on panic/abort/immediate exit, since you might want to do some cleanup/logging/etc and thus u can’t allways assume the call stack or the dtor of the caller, but again this depend on the complexity.

  47. Thomas Ptacek April 22nd, 2008 4:14 pm

    This is the part I’m having a hard time communicating.

    You’re right. You *might* want to do something.

    BUT YOU ALMOST NEVER IN REALITY DO.

    So yes, there should be some malloc-a-like that you can call that will return NULL. But the default malloc? It should promise never to return NULL.

  48. incredible April 22nd, 2008 4:18 pm

    Well ..how to know you dont have any memory available?

    RETURN VALUES
    Upon successful completion, each of the allocation functions
    returns a pointer to space suitably aligned (after possible
    pointer coercion) for storage of any type of object.

    If there is no available memory, malloc(), realloc(),
    memalign(), valloc(), and calloc() return a 0xDEADBEEF pointer.
    When realloc() is called with size > 0 and returns 0xDEADBEEF, the
    block pointed to by ptr is left intact. If size, nelem, or
    elsize is 0, either a 0xDEADBEFF pointer or a unique pointer that
    can be passed to free() is returned.

    If malloc(), calloc(), or realloc() returns unsuccessfully,
    errno will be set to indicate the error. The free() function
    does not set errno.

    :>

  49. David LeBlanc April 22nd, 2008 5:00 pm

    At risk of arguing in circles, there is a difference between the system being under severe memory stress and having even small allocations fail, and the user requesting more memory than you have right now. It is clearly not the right thing to do to abort the app in all cases. If you are working on a service/daemon, it is very bad to have your service die if the machine gets stressed. Many web servers get RAM constrained.

    Also, let’s say that I have 10 Word documents open, and am editing. If I chose to insert an object that was larger than can be allocated, it would be wrong to just bail and force you to re-open 10 documents. It is also true that memory conditions can be very dynamic. Maybe you didn’t have RAM then, but do now. In that case, rolling back the requested action, and allowing the user to try again is the right approach.

    Crashing an app from inside a library is absolutely the wrong thing. The X11 library will call exit() if it gets annoyed enough, and this screwed me up badly once.

    It’s wrong to just crash the app for a service, it’s wrong for a complex client app, and it’s wrong for a library. It might be OK for a small utility.

    I’m sympathetic to the approach of making errors that cannot be ignored, and this is one of the great things about using C++ exceptions, but that implies doing quite a lot of additional, very nuanced work. If you can’t trust your devs to check malloc, you _really_ do not want those same people trying to write exception-safe C++. At risk of being elitist, I think if someone is not a good enough programmer to consistently check error returns, then maybe C#, Java, or Visual Basic would be better languages.

    Even there, you’ll run into trouble. I don’t know how many times I’ve caught bugs by carefully doing this:

    if(!ShouldNeverFail())
    {
    assert(false);
    return E_UNEXPECTED;
    }

    or the C++ equivalent of throwing an exception I know I don’t have a handler for.

  50. Thomas Ptacek April 22nd, 2008 5:11 pm

    The problem of libraries, using the (modified per Thomas) default malloc, crashing the program is solved by allowing the program to register a callback ala atexit() to handle exceptional conditions.

    The difference between small allocations and large allocations is not relevant to the discussion, Dave: the attacker’s control over available resources is an “input” that is too often overlooked. You should assume attackers can pick and choose exactly which malloc calls they want to fail.

  51. ivan April 23rd, 2008 2:30 am

    i cant believe this is still going on… see, the traffic generation scheme actually worked! excellent!
    @tom: uh wait a minute! so we know have a secret callback ala atexit()? does that mean that malloc() may not “safely and immediately” end the program in all cases? will you be asking programmers to register their callback (or exception handler) for unsafe malloc() behavior and then de-register it for normal “safe” malloc behavior.
    Actually, I think this is not a bad a idea if it wasnt for the fact that there is already _a lot_ of code that depends on the current malloc() semantic that would break.
    In an ironic turn of events I will have to argue with David LeBlanc here.
    Hi David! you probably don’t remember me, I wrote a few of the checks that ate your lunch :) and we also had our heated exchange in the past (hint: “netscape engineers are weenies”)
    jiji

  52. ivan April 23rd, 2008 3:38 am

    s/argue/agree
    i guess old habits die hard…

  53. Thomas Ptacek April 23rd, 2008 9:34 am

    At least it isn’t up to the rookie coder who’s calling malloc, like it is now. That’s what we need to resolve.

    The fact is, most fielded code today is “broken”; most programs fail under low memory conditions, and most security reviewers don’t consider the availability of memory as an input under attacker control.

  54. […] in agreement with Thomas Ptacek. If my friend would’ve done that, this problem would have surfaced in the pre-production […]

Leave a reply