Archive for January, 2008

Funny WordPress Vulnerability

Mike Tracy | January 23rd, 2008 | Filed Under: Uncategorized

There’s an interesting discussion on reddit about a recently “fixed” WordPress vulnerability. tqbf brought it to my attention this evening and it made me laugh… A LOT. It also got me thinking about too many cooks, bad recipe books and dead tasters for no other reason than I was hungry and in search of a metaphor.

Originally reported by Michael Brooks, this simply has to be one of the silliest vulnerabilities ever:

Looking at the code that caused all the ruckus, we find:

return ($wp_query->is_admin || (stripos($_SERVER[‘REQUEST_URI’], ‘wp-admin/’) !== false));

In PHP, the difference between _SERVER[‘PHP_SELF’] and _SERVER[‘REQUEST_URI’] is that the latter also gives you the parameters that were passed to a GET request. For example, getting the url http://www.victim.com/victim_blog/index.php?var=val will set PHP_SELF to “/victim_blog/index.php” and REQUEST_URI to “victim_blog/index.php?var=val”. What does this mean? Well, in the case of the above code from WordPress, if you GET an URL like /victim_blog/index.php?foo=wp-admin/&realvar=realval, is_admin() will return true.

This might sound like the end of the world except that in WordPressLand(tm), the is_admin() function is supposed to be used to find the user’s context and not to determine if the user has admin privileges right? Even though this is pretty shoddy, there’s no real privilege escalation to be had… correct? Hoooold the phone there laddy buck… also in the reddit comments is a code snippet from wp-includes/query.php that looks like:

1172              if ( is_admin() )
1173                  $where .= " OR post_status = ‘future’ OR post_status = ‘draft’ OR post_status = ‘pending’";

Apparently, we’re using the is_admin() call to figure out if a user is administrator after all? This snippet adds to the WHERE clause in the SQL query being built to display a list of posts. I think. &get_posts() has to be the mother of all query building functions. You try reading that code and see if you can make sense of it. The WHERE clause addition adds posts that haven’t been published yet to the list of posts to be displayed. It’s based on whether is_admin() returns true. is_admin() is only supposed to be used to find the user’s context not to determine if a user has ad… well… you get the point.

This kind of thing is academically interesting because it’s the kind of thing that web penetration tests (especially automated ones) will rarely, if ever, find. The Unsafe Use of User Input goblin lurks just around every equals sign, wating to fling your deepest darkest diary entries at unsuspecting passers-by.

More amusing still is the “fix” for the problem. Instead of actually checking if the user is an administrator (using the recommended method whatever that might be) before appending this “invasion of posts I am trying to keep hidden omg!” to the where clause, they just kinda change the is_admin() function to prevent the mishigas described above. In my heart of hearts, I think whoever was doing the fix was afraid to change anything in &get_posts() and perhaps unwittingly unleash SkyNet. The is_admin() function is used to determine… well… ok… nevermind.”

Comment Bubble 9 Comments

Aguri: Coolest Data Structure You’ve Never Heard Of

Thomas Ptacek | January 21st, 2008 | Filed Under: Uncategorized

1.

Some remedial computer science, for the PCI auditors in my audience.

I hand you an array of random integers. How can you tell if the number three is in it?

Well, there’s the obvious way: check the numbers sequentially until you find the “3” or exhaust the array. Linear search. Given 10 numbers, you have to assume it could take 10 steps; N numbers, N steps.

Picture 1.png

Linear search is bad. It’s hard to do worse than linear. Let’s improve on it. Sort the array.

Picture 2.png

A sorted array suggests a different strategy: jump the middle of the array and see if the value you’re looking for is less than (to the left) or greater than (to the right). Repeat, cutting the array in half each time, until you find the value.

Binary search. Given 10 numbers, it will take as many as 3 steps —- log2 of 10 —- to find one of them in a sorted array. O(log n) search is awesome. If you have 65,000 elements, it’s only going to take 16 steps to find one of them. Double the elements, and it’s 17 steps.

But sorted arrays suck; for one thing, sorting is more expensive than linear search. So we don’t use binary search much; instead, we use binary trees.

Picture 3.png

To search a binary tree, you start at the top, and ask yourself “is my key less than (left) or greater than (right) the current node”, and repeat until ok, ok, ok, you know this stuff already. But that tree is pretty, isn’t it?

Search with a (balanced) binary tree is O(log n), like binary search, varying with the number of elements in the tree. Binary trees are awesome: you get quick lookup and sorted traversal, something you don’t get out of a hash table. Binary trees are a better default table implementation than hash tables.

2.

But binary trees aren’t the only tree-structured lookup mechanism. Binary radix tries, also called a PATRICIA trees, work like binary trees with one fundamental difference. Instead of comparing greater-than/less-than at each node, you check to see if a bit is set, branching right if it’s set and left if it isn’t.

Picture 4.png

I’m leaving out a lot about how binary radix tries work. This is a shame, because radix tries are notoriously underdocumented —- Sedgewick infamously screwed them up in “Algorithms”, and the Wikipedia page for them sucks. People still argue about what to call them! In lieu of an explanation of backlinks and bit-position-labelled edges, here’s a tiny Ruby implementation.

Here’s why radix tries are cool:

  • Search performance varies with the key size, not the number of elements in the tree. With 16 bit keys, you’re guaranteed 16 steps regardless of the number of the elements in the tree, without balancing.

  • More importantly, radix tries give you lexicographic matching, which is a puffed-up way of saying “search with trailing wildcard”, or “command-line-completion-style search”. In a radix tree, you can quickly search for “ro*” and get “rome” and “romulous” and “roswell”.

3.

I’ve lost you.

Let’s put this in context. Tries are a crucial data structure for Internet routing. The routing problem goes like this:

  • You have a routing table with entries for “10.0.1.20/32 -> a” and “10.0.0.0/16 -> b”.

  • You need packets for 10.0.1.20 to go to “a”

  • You need packets for 10.0.1.21 to to to “b”

This is a hard problem to solve with a basic binary tree, but with a radix trie, you’re just asking for “1010.0000.0000.0000.0000.0001.0100” (for 10.0.1.20) and “1010.” (for 10.0.0.0). Lexicographic search gives you “best match” for routing. You can try it in the Ruby code above; add *”10.0.0.0”.to_ip to the trie, and search for “10.0.0.1”.to_ip.

The correspondance between routing and radix tries is so strong that the most popular general-purpose radix trie library (the one from CPAN) is actually stolen out of GateD. It’s a mess, by the way, and don’t use it.

If you understand how a trie works, you also understand how regular expressions work. Tries are a special case of deterministic finite automata (DFAs), where branches are based exclusively on bit comparisons and always branch forwards. A good regex engine is just handling DFAs with more “features”. If my pictures make sense to you, the pictures in this excellent article on Thompson’s NFA-DFA reduction algorithm will too, and that article will make you smarter.

4.

You’re a network operator at a backbone ISP. Your world largely consists of “prefixes” —- IP network/netmask pairs. The netmasks in those prefixes are hugely important to you. For instance, 121/8 belongs to Korea; 121.128/10 belongs to Korea Telecom, 121.128.10/24 belongs to a KT customer, and 121.128.10.53 is one computer inside that customer. If you’re tracking down a botnet or a spamming operation or worm propagation, that netmask number is pretty important to you.

Unfortunately, important though they are, nowhere on an IP packet is there stamped a “netmask” —- netmasks are entirely a configuration detail. So, when you’re watching traffic, you essentially have this data to work with:

ips.png

Surprisingly, given enough packets to look at, this is enough information to guess netmasks with. While working at Sony, Kenjiro Cho came up with a really elegant way to do that, based on tries. Here’s how:

Take a basic binary radix trie, just like the ones used by software routers. But bound the number of nodes in the tree, say to 10,000. On a backbone link, recording addresses out of IP headers, you’ll exhaust 10,000 nodes in moments.

Store the list of nodes in a list, sorted in LRU order. In other words, when you match an IP address with a node, “touch” the node, sticking it at the top of the list. Gradually, frequently seen addresses bubble up to the top, and infrequently seen nodes sink to the bottom.

Picture 6.png

Now the trick. When you run out of nodes and need a new one, reclaim from the bottom of the list. But when you do, roll the data from the node up into its parent, like so:

Picture 5.png

10.0.1.2 and 10.0.1.3 are sibling /32s, the two halves of 10.0.1.2/31. To reclaim them, merge them into 10.0.1.2/31. If you need to reclaim 10.0.1.2/31, you can merge it with 10.0.1.0/31 to form 10.0.1.0/30.

Do this over, say, a minute, and the standout sources will defend their position in the tree by staying at the top of the LRU list, while ambient /32 noise bubbles up to /0. For the raw list of IP’s above, with a 100 node tree, you get this.

Cho calls this heuristic Aguri.

5.

Aguri is BSD licensed. You can go download it, and a driver program that watches packets via pcap, from Cho’s old home page.

6.

I’m going somewhere with this, but I’m 1300 words into this post now, and if you’re an algorithms person you’re tired of me by now, and if you’re not, you’re tired of me by now. So, let Aguri sink in, and I’ll give you something cool and useless to do with it later this week.

Comment Bubble 62 Comments

Predictions ‘07 Wrapup at Nate’s Blog

Thomas Ptacek | January 15th, 2008 | Filed Under: Uncategorized

Nate and I revisited our predictions from January 2007. I beat him! We’re working on ‘08 now —- last year, I made the mistake of predicting Schneier wouldn’t get a New York Times Op-Ed, ten days before he did.

Comment Bubble 1 Comment

NYSEC TIME!

Jeremy Rauch | January 14th, 2008 | Filed Under: Uncategorized

When: Tomorrow, Jan 15th @6PM

Where: Pound and Pence (http://www.poundandpence.com/). 55 Liberty St, at the corner of Liberty and Nassau. Its easily accessed from just about any of the subway lines, the PATH, NY Waterway, etc.

We’ve been seated in different areas the last few meetings. Rather than wander aimlessly around the bar, I’d recommend asking at the front where the NYSec people are. They should send you our way.

Comment Bubble 1 Comment

Who We Are

Matasano is a team of internationally respected security experts who have led security efforts at @stake, Microsoft, ISS, Secure Computing, Arbor Networks, Secure Networks, Bloomberg, Sandia Labs, and others. Read more about our team and how we can help you today.