Archive for September, 2007

The X86 Memory System And Why It’s Hard To Virtualize Securely

Thomas Ptacek | September 28th, 2007 | Filed Under: Defenses, Uncategorized

The X86 memory system causes surprising confusion. By way of analogy: take iTunes on Windows, to machines talking to the Internet from a corporate network. Here’s how memory breaks down in the analogy:

  • User virtual addresses: Pretend iTunes is a VMware guest image, running in bridged networking mode, assigned 10.0.1.8 (a private IP).

  • Kernel virtual addresses: Windows iTunes runs under the Windows NTOSKRNL kernel; in our analogy, this is VMWare’s host operating system. Like the bridged VMware guest, it’s on the internal net, with 10.0.1.4.

  • Physical addresses: NTOSKRNL works with the CPU’s MMU to map virtuals into physical addresses. In our analogy, the MMU is like a NAT firewall, translating 10.0.1.4 and 10.0.1.8 to external addresses, like 66.178.176.10.

  • IO Addresses: A firewall can’t get you onto the Internet without a router. An MMU can’t get you to memory or devices without a chipset. The “northbridge” in the chipset is like that router. A router figures out that when you use an external address like 66.178.176.10, you mean ESCAPE.COM, but when you say 69.61.87.163, you mean SKODA.SOCKPUPPET.ORG, which is on your own network. The chipset northbridge figures out that some physical addresses go to memory, and some go to devices.

  • The PCI Bus: is like the Internet. Nobody on the Internet can talk to you using your 10.0.1.8 address. 10-net addresses don’t route. PCI devices don’t use virtual addresses, like iTunes sees. They use physical addresses. The backbone routers PCI devices use to get to physical memory are, in X86 parlance, the “southbridge”.

  • Sound cards: are machines on the Internet. They want to talk to iTunes, which is on a private network. This works for PCI the same way it does for Skype on the real Internet. iTunes, via the kernel, somehow tells the sound card the extenal address of a block of memory it will read and write to. The sound card talks to the chipset to get access to that memory. That’s called DMA. It’s how pretty much all your peripherals work.

In pictures. The memory and DMA story:

dma-small.png

Compared to the enterprise networking story:

enp-small.png

Clear as mud? You’re welcome. Why is this important for security? Mostly because of virtualization.

In 5 years, nobody will deploy new servers for applications. Every application will run on a virtual machine. Inevitably. And as we approach this end-state, more and more of our applications are virtualized, and performance becomes a bigger concern.

One of the ways virtualization hurts performance is to come between the operating system and the hardware.

So we’d like some way of giving virtual machines direct access to slices of hardware. For example: a high-performance web server might want to own a NIC. The problem is, the web server lives alongside 10 other guest VMs that don’t trust it. The NIC’s DMA access forces them to.

X86 hypervisors do an elaborate dance with guest OS kernels, tricking them into believing they control the physical/virtual “NAT” table. That dance happens entirely within the CPU. On the Internet, Petko D. Petkov does not care about your 10.0.1.8 address. If your firewall will let him talk to you at 69.61.87.160, he can attack you. On your computer, a NIC does not care about your kernel and user virtual addresses. It’s using physical addresses, which address all the memory shared by all the VMs in the machine.

This is a hole in the X86 VMM security strategy. Today, if you want your guest VMs not to have to trust each other, and one of them needs direct access to NIC, you have to trust that the NIC can’t be coerced into copying network packets with executable code directly into the kernel memory of the other machines.

Nate Lawson is writing a series on the nuts and bolts of this. CPU vendors are proposing features, such as IOMMUs, that will regulate device access to memory in much the same manner as the CPU’s MMU regulates access between processes. These features may or may not work in the long run; in the short run, it looks like they provide show-stopping performance problems of their own.

You only need to read Nate’s posts if you (a) believe virtualization will be ubiquitous within the next 5 years, (b) work in security, and (c) believe you need to sound like you know what you’re talking about. Otherwise, Nate’s posts are entirely optional.

Comment Bubble 12 Comments

Adam Bozanich Did Not Uncover An NSA IPsec Conspiracy: Diffie Hellman Parameter Validation Explained

Thomas Ptacek | September 25th, 2007 | Filed Under: Defenses, New Findings, Uncategorized

introduction

Alice and Bob want to arrange a tryst. Bob’s wife Eve is sniffing their Instant Messenger traffic.

No problem! Alice and Bob will encrypt their messages with AES-256-CBC. Eve won’t be able to decode the traffic before the heat death of the universe.

One small problem: what AES key to use? Actually: not a small problem. No matter what key Alice comes up with, she can’t send it to Bob without exposing it. She can’t encrypt the key; chicken and egg.

number theory f.t.w.

There’s a solution to this problem. Alice and Bob will run the Diffie-Hellman protocol (DH) to securely exchange a key.

Alice and Bob agree on a prime number p, and a smaller number g with a special relationship with p ([1]). These are parameters to Diffie Hellman. “23” and “7” are valid p and g parameters. So are “37” and “5”. They aren’t secret. Think of them like a “version” of DH that Alice and Bob agree to use.

For the sake of argument, Alice and Bob agree on “37, 5” DH.

Bust out your calculator. I’ll be Bob.

Generate a random number a, modulo 37 (divide your number by 37 and a is the remainder).

I’ll do the same thing to generate a different b.

Now take your a and make A. Raise 5 to the a‘th power, modulo 37 (in Ruby, do “(5 ** a) % 37”. I’ll do the same with b to make B.

A is your public key. a is your private key. Send me A. I’ll send you B. It’s 29.

Take 29 and raise it to the a‘th power mod 37 (I don’t know what a is; it’s your private key, so I can’t tell you what you’ll get.) That’s “(B ** a) % 37” in Ruby. I’ll do the same with B.

We just arrived at the same number. Was your private key 7? We came up with 8. Was your private key 26? We came up with 27. 27 is our session key.

Funny thing about our session key: you know it, and I know it, but Eve can’t know it. Even though we did this computation out in the open. This is the deep magic, yo.

quick aside:

Well OK, Eve totally knows what number we came up with, because we used a ridiculously small p. Instead of 37, 5, try this:

FFFFFFFF FFFFFFFF C90FDAA2 2168C234 C4C6628B 80DC1CD1
29024E08 8A67CC74 020BBEA6 3B139B22 514A0879 8E3404DD
EF9519B3 CD3A431B 302B0A6D F25F1437 4FE1356D 6D51C245
E485B576 625E7EC6 F44C42E9 A637ED6B 0BFF5CB6 F406B7ED
EE386BFB 5A899FA5 AE9F2411 7C4B1FE6 49286651 ECE45B3D
C2007CB8 A163BF05 98DA4836 1C55D39A 69163FA8 FD24CF5F
83655D23 DCA3AD96 1C62F356 208552BB 9ED52907 7096966D
670C354E 4ABC9804 F1746C08 CA237327 FFFFFFFF FFFFFFFF

Just use 2 as g. Remember now that your private key a is a random number modulo this huge p. It will also be huge.

Raising a number to a power modulo another number is modular exponentiation (modexp). A modexp is straightforward to compute.

The opposite of exponentiation is a logarithm. Taking a log mod another number is a discrete logarithm. Discrete logs are extremely hard to solve. With large enough numbers, “racing heat death of the universe” hard; “every atom in the Solar System is a computer trying to solve it, still unsolvable” hard.

So you have two closely related operations, exponentiation and logarithm mod p, one of which is easy to compute and one of which is hard. This is a cryptographic building block and the foundation of DH.

back to alice and bob

We got 26 for our session key. Let’s say we MD5 the text value of “26”, to get “4e732ced3463d06de0ca9a15b6153677”. Tack a “0x” at the front of that value, and we now have a 128 bit number we can use as an AES key. Eve can’t know we came up with 26, and so can’t know what the MD5 of the session key is, and so doesn’t know what our AES key is.

Problem solved! Cue arbitrary foreshadowing device. We’ll come back to this.

what adam bozanich found

Adam works for Mu Security. Mu is writing an IKE fuzzer. IKE is a key exchange protocol for IPSec. IKE is probably one of the worst protocols ever to come out of the IETF, and I’m not going to explain it.

What you need to know is that IKE generates the keys that secure dynamically-keyed IPSec tunnels (layer-3 VPNs), and that IKE uses DH to do that.

Adam remembers a DH gotcha. When Alice and Bob run DH, they have to be careful not to allow any of the parameters —- p, g, A, or B —- be zero or one mod p.

Why’s that? Bust out your 37, 5 DH math again. Alice’s session key computation is “(B ** a) % 37”. Bob sends B. If B is 1, the session key is… wait for it… 1. If B is 0, it’s 0. No matter what Alice’s private key is.

If Eve sees a public key that works out to 0 or 1 mod p, Eve knows what the session key was; it’s zero or one. Remember that there are infinitely many values that work out to 0 or 1 mod p. 0 is 0 mod 37. So is 37. So is 74. And so on. Bob could have sent any one of those bad public key values.

Alice and Bob are supposed to check for this. Adam looked at a bunch of IKE implementations. None of them did.

context

This is a well-known problem. It’s in Eric Rescorla’s DH RFC. It’s in an ANSI standard. You can’t use p - 1 mod p either. You also have to be careful with g; broken values will generate subgroups, not the entire group. And real DH implementations have optimizations that can be attacked mathematically if you’re not careful.

Just because it’s well known doesn’t mean people won’t make the mistake. That’s why Nate and I pointed the problem out last year (almost to the day!).

The general class of problems we’re talking about is parameter validation. You attack them by looking at the messages a protocol exchanges, and sending malicious bad values that will break the computation.

Are you a security researcher? Go look for some of these bugs. They rock. More in a sec.

but back to adam

So it’s interesting that IKE implementations don’t validate parameters, but it’s not particularly meaningful.

IKE Bob can send IKE Alice a bad public key. This key will cause Eve to know the session key Bob and Alice use. But why would Bob do this? He’s negating his own security!

Eve can try to inject a bad public key into the session. But so what? If Bob and Alice don’t agree on A and B, the public keys, they can’t run DH. Maybe a different attacker, Mallory, can get between Alice and Bob and proxy the messages. She’ll be a “man in the middle”.

In this case, Mallory wins. No matter what parameters you use and what bugs your code has, Mallory beats DH. This is a basic challenge with DH. The solution involves more crypto.

DH is a useful building block, but protocols that use DH usually depend on some other operation to ensure security. For instance, DH SSL uses RSA certificates to beat Mallory. DH is useful to SSL; it gives you perfect forward secrecy by not tying your SSL session key to a fixed RSA key that can be compromised. But DH SSL depends on RSA for security.

So it’s not a good sign that IKE implementations aren’t smart enough to do parameter validation. But it doesn’t make much of a difference. The “vulnerability” allows an IKE participant to elect an insecure session.

accident? or… murder!

Adam wants to take this somewhere. So he writes a blog post suggesting that the uniform weakness of IKE implementations could be evidence of a conspiracy.

I don’t think he’s being entirely serious, but here’s his argument: the NSA —- no wait, it’s 2007, let’s make it the RIAA —- demands the ability to snoop on everyone’s IPSec sessions. They get all the IKE vendors to ship backdoored IKE agents. On some secret signal, the IKE agent will send public keys that work out to 0 mod p, and the RIAA can break these sessions with the sniffers they’ve installed at every exchange point on the Internet.

There’s a reason this is silly besides the fact that it involves backdooring one of the least important security mechanisms on the Internet: it’s a dumb attack.

Mallory knows p and g. So does Seth, the secops guy. Mallory can watch for A and B values that are 0 mod p. So can Seth. Why would the RIAA install a secret backdoor you could write an IDS signature for?

There are much more subtle things you can do to backdoor DH. Start with, IKE implementations can just be backdoored with a known private key value, or a root backdoor private key and an algorithm for generating variants of it. Move on from there.

This isn’t a conspiracy. This is just ignorance.

and I know that because…?

Because this vulnerability happens all over the place.

Take SRP for example. SRP is a derivative of DH. Alice is a client, Bob is a server. Alice and Bob know Alice’s password. Alice generates a public key that is related to the SHA-1 hash of the password. Alice and Bob do a DH exchange to prove Alice knows the password, and if the exchange works, Alice wins.

Mallory sends Bob a public key of 0 mod p. Mallory wins. She can log in as Alice without knowing Alice’s password.

This vulnerability is well known. It’s in the RFC for SRP. But I’ve tested SRP implementations written by smart security people, and they’ve had this vulnerability. (Due credit: I got this trick from Trevor Perrin, via Nate Lawson).

Compare the SRP vulnerability to the DH vulnerability. The SRP parameter validation problem is really, really bad. It’s auth bypass. The DH vulnerability is just a dumb way to backdoor an IKE agent.

well played, adam

IKE is a mess of a protocol. Crypto parameter validation attacks are totally underappreciated. Adam’s blog post is cool. It was smart of him to check for this, and interesting that he found it.

and what have we learned?

Not much about IKE. It’s as secure as you thought it was before you heard about Adam’s finding.

Hopefully a lot about crypto, though. Like I said, we wrote a post last year talking about parameter validation tricks for DH, SRP, Elliptic Curve, and RSA.

Forget about algorithmic attacks and advances in factorization and quantum computers. Parameter validation attacks are a systems flaw; systems depend on code, and code is riddled with bugs. All bugs are vulnerabilities. Most… any? cryptosystems will be beaten by stupid bugs like this.

Go look for them!


[1]: g is a primitive root modulo p. For values of x from 0 to p - 1, raise g to x modulo p. If g is a primitive root modulo p, you’ll end up with every number between 1 and p - 1, in some order. For instance, the following Ruby snippet asks if 3 is a primitive root mod 37.

lst = [] ; 0.upto(36) {|d| lst << ((3 ** d) % 37)} ; lst.sort

elicits the following output:

[1, 1, 1, 3, 3, 4, 4, 7, 7, 9, 9, 10, 10, 11, 11, 
 12, 12, 16, 16, 21, 21, 25, 25, 26, 26, 27, 27, 28, 
 28, 30, 30, 33, 33, 34, 34, 36, 36]

2’s missing. So’s 5 and 6, etc. 3 is not a primitive root mod 37. Now try 5:

lst = [] ; 0.upto(36) {|d| lst << ((3 ** d) % 37)} ; lst.sort

[1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
 29, 30, 31, 32, 33, 34, 35, 36]

All the numbers between 1 and 36 are there. 5 is primitive root mod 37.

Why is this important? Because if you’re working with a 1024 bit prime, you want session key values to possibly be any one of those 1024 bit values. Another word for “primitive root” is generator. Generators generate the entire group of values mod p. Bad values of g generate subgroups, with fewer than 1024 bits of possible values. The worst g values generate only two possible values!

Comment Bubble 7 Comments

On the subject of PDF vulnerabilities

Dave G. | September 23rd, 2007 | Filed Under: Industry Punditry

I suspect this will become a lot more common as critical software applications continue to get a grip on software vulnerabilities. We will see attackers spread out to the most commonly installed software on computers. And PDF is a likely target.

Why? Because:

  1. It is complicated. And by that I mean over 1000 page of specifications.
  2. It involves a crap-ton of parsing. Don’t believe me? Take the above specifications and load it into a text editor. You can just smell the memory coruption. Did I mention it is 31 Megs?
  3. Modern PDF Readers do crazy things. Like embed remote web pages. That means they talk to the internet. That means more network attack surface!
  4. Deployed Everywhere. Most desktops have at least one PDF reader on them. The Mac uses PDF all over the place.

These conditions create the perfect storm for the modern attacker. This is going to get worse not better.

Comment Bubble 7 Comments

Quit Your Job! Come Join Us!

Max Caceres | September 18th, 2007 | Filed Under: Matasano, Navel Gazing

Hello all, I’m Max Caceres, a new addition to the Matasano team and to this blog. Most recently I ran product management for Core Security, where I got the chance to lead crazy smart folks in the development of CORE IMPACT, a very successful commercial penetration testing product you may have heard about.

I’ve recently joined Matasano to help grow the product side of the house, and in that light I’m happy to inform that we are hiring! We are currently looking for a software developer to work full time on product dev. Just in case you are getting to this post from one of the ads we’ve published elsewhere, or because someone forwarded a link to you, here’s a quick run down of what we are about and what we are looking for.

We’re Matasano Security:

  • an established, profitable indie information security company, with

  • a significant presence in Chicago and NYC, and

  • founded by key players from top industry names.

You’re a software developer:

  • with mastery of one/more of Python, Ruby, Lisp, or C, and

  • mastery of Unix (Linux, BSD, or Solaris), and

  • strong web app skills, including

  • 1-3 years pro dev experience (preferred), and

  • familiarity with networking and security, and

  • enthusiastic about working on a Rails project.

We’re: taking our first product from alpha (now) to launch (soon) in the span of a few months. You: think this sounds like a great chance to work in a bootstrapped startup environment. We: should talk.

Check out this blog to learn more about about us. Or read this press hit for more on the product we’re talking about.

This is a 105% get-stuff-done climate: a full-time dev role on a small product team in a thriving consultancy, with no middle management, minimal politics, and a breakneck schedule.

We will be looking at filling more positions at Matasano pretty soon, please drop us a line at careers@matasano.com if you are interested in joining us.

Comment Bubble 4 Comments

A Tale Of Two OWASPS

Dave G. | September 14th, 2007 | Filed Under: Gatherings

It was the best of times, it was the worst of times. I take that back. It was mostly the latter. I had to be out the door by 4:30AM to catch my plane to Chicago. Which would be bad enough under normal conditions, but I just moved to a new apartment, and those noisy kids are keeping me up past 11PM again. Two boot, when I get downstairs I realize that the cab strike has begun. Still, capitalism prevailed and a cab pulled over and took me to JFK.

Chicago

I was flying to Chicago for both a meeting and to go to their OWASP meetup. The meeting was hosted at ABN AMRO by Cory Scott (ex-@stake). If I had to guess there were close to thirty people there. Everyone really seemed to know each other. There were two talks:

  1. “Automated Thrash Testing - Andrew Gironda (“Dre” on our blog).” While I am still not 100% clear on what Automated Thrash Testing is, Dre did map a lot of QA processes back to security. I think. I learned a lot about QA processes, but it would have been nice if it focused a little more on security testing. To be fair, my mind was slowing turning into mush thanks to a lack of sleep around the middle of his talk.

  2. “Defeating Information Leak Prevention - Matasano’s Own Eric Monti (This is Eric’s talk from Blackhat).” I didn’t get to see all of Eric’s talk at Blackhat, so getting to see the parts that I missed was excellent. Unfortunately, I am biased so I will refrain from commenting on Eric’s talk.

This followed by drinking. By the time drinking was underway I was only partially conscious. There was a lot of good conversation, ranging from application testing to how one could make their friends fight to the death if you win the lottery. Then I cab it back to the airport where I fall asleep and wake up to a security guard knocking on my door wondering how I overslept a wake-up call and room service. This knock was clearly outlined in the manual under “Is your guest deceased?”.

New York

I catch the next flight back to New York. After two hours of sitting next to the Chicago finalist for the World’s Sexiest bartender contest, I arrive back in New York City. Just in time for day 2 of the New York City Cab Strike. After the world’s longest taxi experience, I arrive at Matasano HQ on Wall St. This gives me just enough time to check email, relay events and head to OWASP New York.

Hosted at the American Stock Exchange, its a totally different vibe. Somewhere in the neighborhood of 75 people. Wooden room. Sandwiches. A podium. Sandwiches. A lot more people dressed formally. Sandwiches. Four speakers:

  1. “OWASP update.” Tom Brennan For better or for worse, OWASP NY/NJ is a lot more connected to OWASP Central. In this case, it meant not listening to Jeff’s message to all of OWASP due to technical difficulties. There was definitely more mention of memberships. Which, as I said before, is a good thing. Tom also ran through some OWASP stats and some questions to engage the audience. The stats weren’t too surprising but there was a funny moment when the stat on what part of the industry came out and there were zero law enforcement on the list. This is funny because of:

  2. “Hackers…BotNets oh My! Obtain a briefing on the current BotNet investigations etc.”, NYC FBI Cyber Crime Unit. This was a pretty good session. Not surprising is the fact that BotNet authors are getting smarter. Surprising is that international law enforcement co-operation appears to be getting better. Especially Romania <-> USA co-operation. Apparently, getting a wiretap in Romania is unusually fast. I thought I detected a little bit of envy that it was so easy. That look disappeared quickly as several in the audience basically volunteered to hack Romanian BotNet herders. The questions at this point were all about slicing and dicing what would be tolerated by the FBI. This part of the Q&A should have ended with him yelling in an Austrian accent “It’s not a tumoor!”, but he managed to not pull out his gun and rid the world of three heckling pen testers.

  3. “Why today’s vulnerability assessments are failing and a case for industry standardization”, Mark Clancy. This speaker pointed out that it is really difficult to look at the results of a vulnerability assessment and determine the competency of the testers, the severity of the results, the amount of coverage and more. This is a real problem for consumers of penetration testing results (think Enterprises). When you have an F10 organization, the number of vulnerability assessments that have to reviewed is astonishing. They are often coming from vendors who hired their own penetration testing team. Then they are often editted down by the vendor. There was a great moment where he stopped talking and people started to solution, but then realized that there were no easy answers.

  4. “Stock fluctuation from an unrecognized influence”, Justine Bone-Aitel, Immunity Security. I’m not sure this was an accurate title for the talk, but it’s always good to see the Immunity folks. The best part of the talk was the statistics on Immunity’s zero day. I am so serious when I say they should do an entire talk around their statistics on tracking their zero day.

  5. “Financial Real-Time Threats: Impacting Trading Floor Operations/JBroFuzz: Effective Fuzzing for Network and Web Applications”, Dr. Yiannis Pavlosoglou , Information Risk Management. This session wasn’t as crisp as I would have liked. I think it was an anatomy of an attack combined with this is what a trading floor might look like. The anatomy of an attack part didn’t seem to be particularly convincing. Due to time constraints and technical difficulties, this talk got cut short.

They also went out drinking, but I was out of steam.

What did I learn from this whole lotta OWASP?

There is a lot of regionality to OWASP. The feel at both OWASPs were totally different and both totally appropriate for their locations. While both OWASPs had their differences, thematically they were on target. Most importantly, OWASP members like to go out drinking.

Comment Bubble 5 Comments

Enough With The Rainbow Tables: What You Need To Know About Secure Password Schemes

Thomas Ptacek | September 10th, 2007 | Filed Under: Defenses, Uncategorized

The socialbookmarkosphere is abuzz with talk of “rainbow tables”, what they mean for password security, and why they prove that Microsoft did a shoddy job of securing Windows for Workgroups 15 years ago. This really freaks me out. If the “advanced” pole of your threat model is “rainbow tables”, stop working on your social shopping cart calendar application right now: I can’t trust you with my Reddit karma score, let alone my credit card number.

.

To begin, password storage 101: servers don’t usually store actual passwords. Instead, they hash the password, store the hash, and discard the password. The hash can verify a password from a login page, but can’t be reversed back to the text of the password. So when you inevitably lose your SQL password table, you haven’t exposed all the passwords; just the crappy ones.

Now let’s re-explain rainbow tables:

  1. take a “dictionary” —- say, of all combinations of alphanumerics less than 15 characters

  2. hash all of them

  3. burn the results onto a DVD.

You now have several hundred billion hash values that you can reverse back to text —- a “rainbow table”. To use,

  1. take your stolen table of hashes

  2. for each hash

  3. find it in the rainbow table.

If it’s there, you cracked it.

.

Here’s what you need to know about rainbow tables: no modern password scheme is vulnerable to them.

Rainbow tables are easy to beat. For each password, generate a random number (a nonce). Hash the password with the nonce, and store both the hash and the nonce. The server has enough information to verify passwords (the nonce is stored in the clear). But even with a small random value, say, 16 bits, rainbow tables are infeasible: there are now 65,536 “variants” of each hash, and instead of 300 billion rainbow table entries, you need quadrillions. The nonce in this scheme is called a “salt”.

Cool, huh? Yeah, and Unix crypt —- almost the lowest common denominator in security systems —- has had this feature since 1976. If this is news to you, you shouldn’t be designing password systems. Use someone else’s good one.

.

No, really. Use someone else’s password system. Don’t build your own.

Most of the industry’s worst security problems (like the famously bad LANMAN hash) happened because smart developers approached security code the same way they did the rest of their code. The difference between security code and application code is, when application code fails, you find out right away. When security code fails, you find out 4 years from now, when a DVD with all your customer’s credit card and CVV2 information starts circulating in Estonia.

.

Here’s a “state of the art” scheme from a recent blog post on rainbow tables and salts:

hash = md5('deliciously-salty-' + password)

There are at least two problems with this code. Yeah, the author doesn’t know what a salt is; “deliciously-salty-” is not a nonce (also, Jeff, your computer really doesn’t care if you seperate the password from the nonce with a dash; it’s a computer, not a 2nd grade teacher).

But there’s a much bigger problem with this code: the letters “md5”.

Two reasons.

1.

You’re expecting me to go off on a rant about how there is no redeeming quality to justify using MD5 in 2007. That’s true (MD5 is broken; it’s too slow to use as a general purpose hash; etc). But that’s not the problem.

2.

The problem is that MD5 is fast. So are its modern competitors, like SHA1 and SHA256. Speed is a design goal of a modern secure hash, because hashes are a building block of almost every cryptosystem, and usually get demand-executed on a per-packet or per-message basis.

Speed is exactly what you don’t want in a password hash function.

Modern password schemes are attacked with incremental password crackers.

Incremental crackers don’t precalculate all possible cracked passwords. They consider each password hash individually, and they feed their dictionary through the password hash function the same way your PHP login page would. Rainbow table crackers like Ophcrack use space to attack passwords; incremental crackers like John the Ripper, Crack, and LC5 work with time: statistics and compute.

The password attack game is scored in time taken to crack password X. With rainbow tables, that time depends on how big your table needs to be and how fast you can search it. With incremental crackers, the time depends on how fast you can make the password hash function run.

The better you can optimize your password hash function, the faster your password hash function gets, the weaker your scheme is. MD5 and SHA1, even conventional block ciphers like DES, are designed to be fast. MD5, SHA1, and DES are weak password hashes. On modern CPUs, raw crypto building blocks like DES and MD5 can be bitsliced, vectorized, and parallelized to make password searches lightning fast. Game-over FPGA implementations cost only hundreds of dollars.

Using raw hash functions to authenticate passwords is as naive as using unsalted hash functions. Don’t.

.

What is the state of the art here?

1.

First, what your operating system already gives you: a password scheme “optimized” to be computationally expensive. The most famous of these is PHK’s FreeBSD MD5 scheme.

The difference between PHK’s scheme and the one you were about to use for your social shopping cart 2.0 application is simple. You were just going to run MD5 on a salt and a password and store the hash. PHK runs MD5 for thousands of iterations. That’s called “stretching”.

PHK’s MD5 scheme is straightforward to code and comes with Linux and BSD operating systems. If you have to choose between the PHP code you have now and PHK’s scheme, you choose PHK’s scheme or you fail your PCI audit. [∗]

2.

The best simple answer is “adaptive hashing”, which Neils Provos and David Mazieres invented for OpenBSD in 1999. Their original scheme is called “bcrypt”, but the idea is more important than the algorithm.

There are three big differences between Provos-Mazieres and PHK’s scheme:

  1. Bcrypt was invented by two smart guys and PHK’s was only invented by one smart guy. That’s literally twice the smart.

  2. Bcrypt uses Blowfish instead of MD5. Blowfish is a block cipher with a notoriously expensive setup time. To optimize Blowfish to run much faster, you’d have to contribute a major advance to cryptography. We security practioners are all “betting people”, and we usually like to place our bets on the side that “demands major advances in cryptography”.

  3. Provos and Mazieres extended Blowfish. They call theirs “Eksblowfish”. Eksblowfish is pessimized: the setup time takes even longer than Blowfish. How long? Your call. You can make a single password trial take milliseconds, or you can make it take hours.

Why is bcrypt such a huge win? Think of the problem from two perspectives: the server, and the attacker.

First, the server: you get tens of thousands of logins per hour, or tens per second. Compared to the database hits and page refreshes and IO, the password check is negligable. You don’t care if password tests take twice as long, or even ten times as long, because password hashes aren’t in the 80/20 hot spot.

Now the attacker. This is easy. The attacker cares a lot if password tests take twice as long. If one password test takes twice as long, the total password cracking time takes twice as long.

Get it?

The major advantage of adaptive hashing is that you get to tune it. As computers get faster, the same block of code continues to produce passwords that are hard to crack.

3.

Finally, as your attorney in this matter, I am required to inform you about SRP.

SRP is the Stanford Secure Remote Password protocol. It is a public key cryptosystem designed to securely store and validate passwords without storing them in the clear or transmitting them in the clear.

That design goal is cooler than it sounds, because there’s usually a tradeoff in designing password systems:

  1. You can store a hash of the password. Now if you lose the password database, you haven’t exposed the good passwords. However, you also don’t know the password cleartext, which means that to validate passwords, your customers need to send them to you in the clear.

  2. You can use a challenge-response scheme, where both sides use a math problem to prove to each other that they know the password, but neither side sends the password over the wire. These schemes are great, but they don’t work unless both sides have access to the cleartext password —- in other words, the server has to store them in the clear.

Most practitioners will select the hashing scheme. Both attacks —- stolen databases and phished passwords —- happen all the time. But stolen databases compromise more passwords.

SRP resolves the tradeoff. It’s an extension of Diffie-Hellman. The salient detail for this post: instead of storing a salted password hash, you store a “verifier”, which is a number raised to the (obviously very large) power of the password hash modulo N.

If you understand DH, SRP is just going to make sense to you. If you don’t, the Wikipedia will do a better job explaining it than I will. For the test next Wednesday, you need to know:

  • SRP is related to Diffie-Hellman.

  • SRP is a challenge-response protocol that lets a server prove you know your password without your password ever hitting the wire.

  • SRP doesn’t require you to store plaintext passwords; you store non-reversable cryptographic verifiers.

  • “Cracking” SRP verifiers quickly would involve a significant advancement to cryptography.

  • SRP is simple enough to run out of browser Javascript.

Awesome! Why aren’t you using SRP right now? I’ll give you three reasons:

  • SRP is patented.

  • To make it work securely in a browser, you have to feed the login page over SSL; otherwise, like Meebo, you wind up with a scheme that can be beaten by anyone who can phish a web page.

  • SRP is easy to fuck up, so the first N mainstream Rails or PHP or Pylons SRP implementations are going to be trivially bypassable for at least the first year after they’re deployed.

.

What have we learned?

We learned that if it’s 1975, you can set the ARPANet on fire with rainbow table attacks. If it’s 2007, and rainbow table attacks set you on fire, we learned that you should go back to 1975 and wait 30 years before trying to design a password hashing scheme.

We learned that if we had learned anything from this blog post, we should be consulting our friends and neighbors in the security field for help with our password schemes, because nobody is going to find the game-over bugs in our MD5 schemes until after my Mom’s credit card number is being traded out of a curbside stall in Tallinn, Estonia.

We learned that in a password hashing scheme, speed is the enemy. We learned that MD5 was designed for speed. So, we learned that MD5 is the enemy. Also Jeff Atwood and Richard Skrenta.

Finally, we learned that if we want to store passwords securely we have three reasonable options: PHK’s MD5 scheme, Provos-Maziere’s Bcrypt scheme, and SRP. We learned that the correct choice is Bcrypt.

This blog post was brought to you in part by a grant from the Jon M. Olin Foundation. Major underwriting for Matasano Chargen is provided by Archer Daniel Midland Company. ADM: Feeding A Hungry World. And of course, readers like you!

[∗] Disclaimer: I cannot actually flunk your PCI audit.

Comment Bubble 120 Comments

Who We Are

Matasano is a team of internationally respected security experts who have led security efforts at @stake, Microsoft, ISS, Secure Computing, Arbor Networks, Secure Networks, Bloomberg, Sandia Labs, and others. Read more about our team and how we can help you today.