Mike Tracy | January 23rd, 2008 | Filed Under: Uncategorized
There’s an interesting discussion on reddit about a recently “fixed”
WordPress vulnerability. tqbf brought it to my attention this evening
and it made me laugh… A LOT. It also got me thinking about too many
cooks, bad recipe books and dead tasters for no other reason than I was
hungry and in search of a metaphor.
Originally reported by Michael Brooks, this simply has to be one of the
silliest vulnerabilities ever:
Looking at the code that caused all the ruckus, we find:
return ($wp_query->is_admin || (stripos($_SERVER[‘REQUEST_URI’], ‘wp-admin/’) !== false));
In PHP, the difference between _SERVER[‘PHP_SELF’] and
_SERVER[‘REQUEST_URI’] is that the latter also gives you the parameters
that were passed to a GET request. For example, getting the url
http://www.victim.com/victim_blog/index.php?var=val will set PHP_SELF to
“/victim_blog/index.php” and REQUEST_URI to
“victim_blog/index.php?var=val”. What does this mean? Well, in the
case of the above code from WordPress, if you GET an URL like
/victim_blog/index.php?foo=wp-admin/&realvar=realval, is_admin() will
return true.
This might sound like the end of the world except that in
WordPressLand(tm), the is_admin() function is supposed to be used to
find the user’s context and not to determine if the user has admin
privileges right? Even though this is pretty shoddy, there’s no real
privilege escalation to be had… correct? Hoooold the phone there
laddy buck… also in the reddit comments is a code snippet from
wp-includes/query.php that looks like:
1172 if ( is_admin() )
1173 $where .= " OR post_status = ‘future’ OR post_status = ‘draft’ OR post_status = ‘pending’";
Apparently, we’re using the is_admin() call to figure out if a user
is administrator after all? This snippet adds to the WHERE clause in
the SQL query being built to display a list of posts. I think.
&get_posts() has to be the mother of all query building functions. You
try reading that code and see if you can make sense of it. The WHERE
clause addition adds posts that haven’t been published yet to the list
of posts to be displayed. It’s based on whether is_admin() returns true.
is_admin() is only supposed to be used to find the user’s context not to
determine if a user has ad… well… you get the point.
This kind of thing is academically interesting because it’s the kind
of thing that web penetration tests (especially automated ones) will
rarely, if ever, find. The Unsafe Use of User Input goblin lurks just
around every equals sign, wating to fling your deepest darkest diary
entries at unsuspecting passers-by.
More amusing still is the “fix” for the problem. Instead of actually
checking if the user is an administrator (using the recommended method
whatever that might be) before appending this “invasion of posts I am
trying to keep hidden omg!” to the where clause, they just kinda change
the is_admin() function to prevent the mishigas described above. In my
heart of hearts, I think whoever was doing the fix was afraid to change
anything in &get_posts() and perhaps unwittingly unleash SkyNet. The
is_admin() function is used to determine… well… ok… nevermind.”
9 Comments
Thomas Ptacek | January 21st, 2008 | Filed Under: Uncategorized
1.
Some remedial computer science, for the PCI auditors in my audience.
I hand you an array of random integers. How can you tell if the number
three is in it?
Well, there’s the obvious way: check the numbers sequentially until
you find the “3” or exhaust the array. Linear search. Given 10
numbers, you have to assume it could take 10 steps; N numbers, N
steps.

Linear search is bad. It’s hard to do worse than linear. Let’s improve
on it. Sort the array.

A sorted array suggests a different strategy: jump the middle of the
array and see if the value you’re looking for is less than (to the left)
or greater than (to the right). Repeat, cutting the array in half each
time, until you find the value.
Binary search. Given 10 numbers, it will take as many as 3 steps —-
log2 of 10 —- to find one of them in a sorted array. O(log n) search
is awesome. If you have 65,000 elements, it’s only going to take 16
steps to find one of them. Double the elements, and it’s 17 steps.
But sorted arrays suck; for one thing, sorting is more expensive than
linear search. So we don’t use binary search much; instead, we use
binary trees.

To search a binary tree, you start at the top, and ask yourself “is my
key less than (left) or greater than (right) the current node”, and
repeat until ok, ok, ok, you know this stuff already. But that tree is
pretty, isn’t it?
Search with a (balanced) binary tree is O(log n), like binary search,
varying with the number of elements in the tree. Binary trees are
awesome: you get quick lookup and sorted traversal, something you
don’t get out of a hash table. Binary trees are a better default table
implementation than hash tables.
2.
But binary trees aren’t the only tree-structured lookup
mechanism. Binary radix tries, also called a PATRICIA trees, work like
binary trees with one fundamental difference. Instead of comparing
greater-than/less-than at each node, you check to see if a bit is set,
branching right if it’s set and left if it isn’t.

I’m leaving out a lot about how binary radix tries work. This is a
shame, because radix tries are notoriously underdocumented —-
Sedgewick infamously screwed them up in “Algorithms”, and the
Wikipedia page for them sucks. People still argue about what to call
them! In lieu of an explanation of backlinks and bit-position-labelled
edges, here’s a tiny Ruby implementation.
Here’s why radix tries are cool:
Search performance varies with the key size, not the number
of elements in the tree. With 16 bit keys, you’re guaranteed
16 steps regardless of the number of the elements in the tree,
without balancing.
More importantly, radix tries give you lexicographic
matching, which is a puffed-up way of saying “search
with trailing wildcard”, or “command-line-completion-style
search”. In a radix tree, you can quickly search for
“ro*” and get “rome” and “romulous” and “roswell”.
3.
I’ve lost you.
Let’s put this in context. Tries are a crucial data structure for
Internet routing. The routing problem goes like this:
You have a routing table with entries for “10.0.1.20/32 -> a” and
“10.0.0.0/16 -> b”.
You need packets for 10.0.1.20 to go to “a”
You need packets for 10.0.1.21 to to to “b”
This is a hard problem to solve with a basic binary tree, but with a
radix trie, you’re just asking for
“1010.0000.0000.0000.0000.0001.0100” (for 10.0.1.20) and “1010.” (for 10.0.0.0).
Lexicographic search gives you “best match” for routing. You can try
it in the Ruby code above; add *”10.0.0.0”.to_ip to the trie, and
search for “10.0.0.1”.to_ip.
The correspondance between routing and radix tries is so strong that
the most popular general-purpose radix trie library (the one from
CPAN) is actually stolen out of GateD. It’s a mess, by the way, and
don’t use it.
If you understand how a trie works, you also understand how regular
expressions work. Tries are a special case of deterministic finite
automata (DFAs), where branches are based exclusively on bit
comparisons and always branch forwards. A good regex engine is
just handling DFAs with more “features”. If my pictures make sense to
you, the pictures in this excellent article on Thompson’s NFA-DFA
reduction algorithm will too, and that article will make you smarter.
4.
You’re a network operator at a backbone ISP. Your world largely
consists of “prefixes” —- IP network/netmask pairs. The netmasks in
those prefixes are hugely important to you. For instance, 121/8
belongs to Korea; 121.128/10 belongs to Korea Telecom,
121.128.10/24 belongs to a KT customer, and 121.128.10.53 is one
computer inside that customer. If you’re tracking down a botnet or a
spamming operation or worm propagation, that netmask number is pretty
important to you.
Unfortunately, important though they are, nowhere on an IP packet is
there stamped a “netmask” —- netmasks are entirely a configuration
detail. So, when you’re watching traffic, you essentially have this
data to work with:
Surprisingly, given enough packets to look at, this is enough
information to guess netmasks with. While working at Sony, Kenjiro Cho
came up with a really elegant way to do that, based on tries. Here’s
how:
Take a basic binary radix trie, just like the ones used by software
routers. But bound the number of nodes in the tree, say to 10,000. On
a backbone link, recording addresses out of IP headers, you’ll exhaust
10,000 nodes in moments.
Store the list of nodes in a list, sorted in LRU order. In other
words, when you match an IP address with a node, “touch” the node,
sticking it at the top of the list. Gradually, frequently seen
addresses bubble up to the top, and infrequently seen nodes sink to
the bottom.
Now the trick. When you run out of nodes and need a new one, reclaim
from the bottom of the list. But when you do, roll the data from the
node up into its parent, like so:

10.0.1.2 and 10.0.1.3 are sibling /32s, the two halves of 10.0.1.2/31. To
reclaim them, merge them into 10.0.1.2/31. If you need to reclaim
10.0.1.2/31, you can merge it with 10.0.1.0/31 to form 10.0.1.0/30.
Do this over, say, a minute, and the standout sources will defend
their position in the tree by staying at the top of the LRU list,
while ambient /32 noise bubbles up to /0. For the raw list of IP’s
above, with a 100 node tree, you get this.
Cho calls this heuristic Aguri.
5.
Aguri is BSD licensed. You can go download it, and a driver program
that watches packets via pcap, from Cho’s old home page.
6.
I’m going somewhere with this, but I’m 1300 words into this post now,
and if you’re an algorithms person you’re tired of me by now, and if
you’re not, you’re tired of me by now. So, let Aguri sink in, and I’ll give you
something cool and useless to do with it later this week.
62 Comments
Jeremy Rauch | January 14th, 2008 | Filed Under: Uncategorized
When: Tomorrow, Jan 15th @6PM
Where: Pound and Pence (http://www.poundandpence.com/). 55 Liberty St, at the corner of Liberty and Nassau. Its easily accessed from just about any of the subway lines, the PATH, NY Waterway, etc.
We’ve been seated in different areas the last few meetings. Rather than wander aimlessly around the bar, I’d recommend asking at the front where the NYSec people are. They should send you our way.
1 Comment