Four C Programming Anti-Idioms

Thomas Ptacek | January 29th, 2006 | Filed Under: Development

Excuse me, because this is off-topic to most of our readers, but very on-topic to me. The following are Four C Programming Anti-Idioms. Each one makes you go through extra effort to get worse code. Stop perpetuating them.

  1. Checking the return value of malloc() Heresy! Blasphemy! Shoddy engineering! Fuck you. So malloc() failed. What are you going to do about it? If you’re 99% of the malloc-checking C programmers in the world, when malloc fails inside a subroutine f(), you’ll do one of three things:
    • You’re going to return -1 from f(). Then, to be safe, in each of f()’s callers, you’ll check f()’s return value, returning -1 on failure. This recurs all the way back to main(), where you will detect -1 and exit.
    • You’ll return from f() without an abnormal error. f()’s callers won’t have an abnormal return value to check for, but that won’t matter, because by aborting f() you left the program in an unknown state and it’s going to segfault no matter what you do.
    • You’ll immediately abort the program.
    My affinity is clearly with the “immediately abort” crowd, and I add only this sentiment: cut out the middleman. Depending on whether you’re shipping production or not, you want one of two things to happen:
    1. You want the failed alloc to cause a segfault that you can catch and debug.
    2. You want the failed alloc to gracefully abort the program.

    (1) is accomplished by doing nothing. (2) is accomplished by preloading a version of malloc() that checks its return value and gracefully aborts on failure (it’s 2006. Why are you wasting time with “xmalloc”? Did you remember “xstrdup” too? Didn’t think so.)

    If you’re one of those clever people that wants to keep an “emergency reserve” of memory to gracefully close the whole program (instead of aborting), it’s still better to preload a new malloc than to explicitly check malloc’s return value.

    Less typing, safer code.

  2. Casting void-star

    It’s the void pointer’s job to stand in for other pointer types. In ANSI C99, the “cast” is implied when you pointers to or from void-star, and redundant when supplied explicitly.

    So this:

    int *i = malloc(sizeof(*i));

    is equivalent to this:

    int *i = (int*) malloc(sizeof(*i));

    But the redundant int-star cast isn’t just extra noise. Because the “implicit cast” behavior is unique to void-star, casting explicitly hides an error: if variable types change (say, you add an argument to a function), and what you thought was a void-star ceases to be a void-star, a casted assignment will suppress a

    valuable warning. For instance:

    void *v = NULL; int *i = v; // just peachy long long *ll = NULL; i = ll; // warning: assignment from incompatible type i = (int*) ll; // no warning, broken assignment

    Yes, yes, I know, you can’t assign FROM void-star without a cast in C++. Stop using C++. And look askance at ANY pointer cast in straight C code.

    Less typing, safer code.

  3. Passing pointers to tiny structures

    You want to return a pointer to a buffer and its length. You use the equivalent of:

    struct iov { u_char *base; size_t len; };

    On LP32, struct iov is as hard to pass in and out of a function as long long. Put differently, unless you go out of your way to individually allocate and pass pointers to long long to avoid the “overhead” (use of successive registers) of copying, you don’t get to pass pointers to struct iov.

    Dragging the point out further, I could say that if you use long long in your code, you should be passing “pair” structs instead of using out-args to store lengths. But I’m not going to go there, because that’d be hypocritical even if it was correct.

    Less typing, safer code.

  4. Wiring down pointer sizes in malloc()

    Moral cousin to gripe (2). You can “dereference” an undefined pointer in a “sizeof” expression. So this:

    v = malloc(sizeof(*v));

    equivalent to this:

    v = malloc(sizeof(struct V));

    except that in the latter example, if the type of “v” changes, you have two places your code needs to change, and god help you if you forget and the resulting code compiles anyways.

    Less typing, safer code.

Viewing 16 Comments

    • ^
    • v
    About number 2:

    Casting a void pointer has always been wrong in C even though *everybody* does it. With malloc() there is at least one other subtle but possibly catastrophic problem. If you forget to include stdlib.h and you are casting all of the malloc return values the compiler will not give you a hint that the malloc prototype is missing. The return value will then default to integer. Now if you or somebody else tries to use your software on some platform where integers and pointers have different sizes or semantics you have a bug. If you're really unlucky the bug will even be exploitable.
    • ^
    • v
    Great point. This is the one bug I ever spotted in qmail (but it's not exploitable there).

    Don't you love that just missing a header file can create vulnerabilities?
    • ^
    • v
    I liked this and I agree except that you say that when malloc fails, exit inmediatly. I say this is a terrible way of doing things, what happens to the memory prevously allocated ? are you going to leave it there and assume it gets free'd ? This might not impact modern OS'ses but what about embeded devices etx ?
    I hope to hear your opinion on this.
    -daniel
    • ^
    • v
    Daniel, it's not a good idea to exit immediately in an embedded system. But exiting immediately from a normal desktop program where the malloc:ed memory is freed during exit of the program is no problem. If you have other resources allocated, you of course should free this during exit. Why not replace exit()?
    /Matt
    • ^
    • v
    Re #1:

    Once I had to debug memory corruption caused by something like this:
    s = malloc(10000)
    s[1025] = 'a';

    malloc returned NULL but the assignment didn't segfault. What happened?

    The problem was the page at 0 was protected, but the next page wasn't. s[1025] = 'a' was writing to somewhere on the heap. This was a pain to debug.

    I prefer using a malloc() wrapper that asserts on the return value or just asserting on return value if there isn't one. Not checking the return value of a function is suspect to me.
    • ^
    • v
    Your comments about trusting malloc are spot on--we really should trust it to do what its supposed to. I've seen lots of programs (particularly embedded ones) where malloc lets you to configure a function pointer (or jump_buf) so you can get a callback on allocation failures... which then makes a great place for a debugger breakpoint.
    • ^
    • v
    The sizeof example can be shortened further without loss of clarity. sizeof is an operator, not a function.

    v = malloc(sizeof *v);

    will do the trick. Less typing, easier to read.

    The parentheses are only needed when you're naming a type, e.g. "malloc(sizeof (struct V))".
    • ^
    • v
    I found #2 and #4 enlightening, and I think I'll start using those techniques in my code. I got in the habit of casting the return of malloc, and I *think* it's because I got compiler warnings when I didn't, but that was back in the day and I can't reproduce this warning with a modern gcc.

    I can't agree with #1, at least when it comes to writing a library. It's just not the library-writer's place to make decisions like that for an application, and if the library-writer's answer to me was "just preload a malloc that does what you actually want" I'd be pissed. Preloading is a useful and clever hack when you need it, but it's no way to engineer the common case. What if one of your libraries statically linked libc into it? What if some libc implementation inlines malloc?

    When I'm really investing a lot of effort into a library, or writing a library that is intended to be useful in space-constrained situations, I'll usually either avoid allocating memory internally or let the client pass me a pointer to their own malloc/free (with an additional user-defined void* so it can be reentrant, if desired). That gives the client maximal flexibility to define their own memory-allocation policy.
    • ^
    • v
    I for one always thought that checking the value of malloc was stupid...
    If X's program is so badly written that the heap can overflow (several GB in todays date) without X being aware of that when writing the program, trying to fix the symptom by checking the value of malloc and exiting is a waste of X's time.
    Better that X advertise X's incompetency by letting the user see a SIGSEGV or Fatal error dialog box depending on the platform.
    • ^
    • v
    Joshua, I will have a hard time arguing with you on that point, but I will wag my finger and say that it is harder than it looks to write a library that can sanely and consistently handle memory exhaustion. I will win the argument that it is better to abort a program than to handle exhaustion less than perfectly. Never limp. Ever.
    • ^
    • v
    You might want to consider longjmp()'ing to a graceful_terminate() (or maybe use atexit()).
    • ^
    • v
    Things you can do with a preloaded malloc replacement, all good.
    • ^
    • v
    "...it’s 2006. Why are you wasting time with
    'xmalloc'?"

    It's 2007. Why are you wasting time with C?
    • ^
    • v
    My past life (kernel networking code), I had very positive results by wrapping malloc/free into foo_malloc/foo_free, with one caveat: when you called foo_free, you had to pass in the size that you foo_malloc'd. This got set/validated by the foo_malloc/foo_free.

    Especially when malloc/free were in separate modules (passing pointers around), this caught all kinds of bugs (arrays of arrays, etc) that assumed different things. Don't seem to have this problem anymore with Ruby. ;-)

    See: http://labs.musecurity.com/2007/07/23/writing-c...
    • ^
    • v
    "About number 2: Casting a void pointer has always been wrong in C even though *everybody* does it."

    nope... unfortunately, you forget that on most systems malloc used to return a char *. (char *malloc(unsigned size);) the cast was a necessity. pre ansi-C89 void pointers were not even guaranteed on your compiler -- much less the behavior regarding their assignment.

    when c++ eventually became its own forked language (still pre-ansi-c89) this behavior remained (though i believe c++ adopted the void * return value for malloc as well).
    • ^
    • v
    Honestly, my opinion is that one may consider usign an exception handling library.
    If you want to follow old-school C programming techniques (which is a valid argument), I would suggest using a errno-like global variable to store the cause of the fault and then do check for errors all the way back to main(). Functions would return the so-called -1 on error, which would be checked when called (possibly with a ALL-CAPS satellite macro to perform the check more verbosely), but generally ignore the errno-like value (unless if it really needs to). Then, centralize all the user error messages in one function.
    This way, the resources can be unallocated easily, and you can reuse the code easily and in many scenarios.

Trackbacks

close Reblog this comment
blog comments powered by Disqus