Applicable Lessons from the Embedded World (aka Forth rules)
Wes Brown | January 9th, 2009 | Filed Under: Development
Premise and Context
There’s a lot that us security researchers can learn from the embedded side of the computing world. Interestingly, some of the requirements that we have for injection vectors in exploits are congruent with what the embedded world has solved for many years.
One area where we can benefit from the experience of embedded programmers is in the art of writing shellcode. Shellcode needs to be compact to fit within the constraints of injection vectors, but yet needs to have enough functionality to be useful. In a survey of Windows shellcode, they typically range from 170 bytes to 320 bytes in size. Coincidentally, this also falls within the size of many small embedded devices. The ATmega8 used in early Arduino boards has 8K of flash memory and 1K of SRAM. Many early 8-bit computers also fell within this range of capabilities.
Shifting paradigms to view the target victim machine as an embedded device, and the host as a console and serial programmer can be very useful. We can then start to apply principles and capabilities that the embedded world has embraced and used for years.
Enter Forth
One language that was quite popular in the early days of the 8-bit microcomputer and often used in microcontroller projects is Forth. In the hierarchy of languages, Forth lies between assembler and C, but yet it has an interpreter. Forth programs are ‘words’. Forth does not distinguish between words from Forth itself, and words that you have written yourself. Words can even be defined by using an inline assembler.
I have used Lisp in the past to solve security problems, and Lisp is similar in this fashion in that you can redefine the entire language. Lisp and Forth are languages for writing other languages in. The difference is that Forth is far closer to the metal, whereas Lisp abstracts it out with abilities such as a garbage collector. Using Lisp, Scott Dunlop and I implemented MOSREF, a remote execution framework that uses secure communications channels to transmit dynamically compiled bytecode to the remote victim drone. However, the virtual machine was 64K in size, which was sufficient for a second stage payload, but not for first stage shellcode.
The core of Forth is a simple loop that does parsing, and executes Forth words that are responsible for control structures, and this is the ‘compiler’ for Forth programs. Forth does not really have data structures. It just allocates byes of memory and assigns a name to them. The name then returns the address to allow operation on the allocated bytes of memory.
In Forth, there is little difference between the compiler, interpreter, and VM. The nature of the Forth VM allows the sequential expression and execution of code. Every instruction is either inherently understood by the VM, or is defined in terms of things that are inherently understood. It is orthogonal to the extreme.
In sum, Forth is very much like a higher level assembler with a stack. By using Forth, we can abstract out the lower level details of the platform by writing Forth words for that specific architecture. We get the power of assembler in a denser package that is more portable across platforms.
Leveraging Forth
By using a tokenizer and a vocabulary, we can store and use Forth words in a much smaller space than the equivalent assembler instructions. In theory, we could implement a virtual machine inside of 2-bits of Forth vocabulary with the following instructions:
- LOAD - loads from memory address
- STORE - stores from
- ADD - adds two values from the stack
- IF - conditional branching
In this case, each Forth word would be 2-bits wide. This can be a great advantage when compared to the host instruction language that uses a 16 bit assembler instruction word. If we expand this to 4-bits to allow 16 words in our vocabulary, we can have four Forth instructions for every 16 bit assembler instruction word. We add a few Forth words for defining words, and then we can custom fit the remainder of the vocabulary to the exact purpose at hand.
If we implement a token-threaded Forth with one-byte tokens, we would have 256 possible words that we can define, and still be smaller than assembler programs even on 16-bit platforms. This is further magnified on 32-bit and 64-bit instruction sets.
There is a 3-instruction Forth on the MC68HC11 platform, which is 66 bytes when assembled. If we can get close to this target size on a modern platform such as Intel’s x86, then we would have a potential win in size compared to the 300-byte Windows shellcode. We would have vast tracts of hundreds of bytes to write our programs within!
Expanding Further
Once we have implemented a minimal Forth virtual machine, we can then define special purpose Forth words for our shellcode injection purposes. By defining common functionality that would be in every exploit such as ‘ESTABLISH’ to establish a communications channel to the attacker’s host, and ‘SEND’ and ‘RECEIVE’, we can abstract out common library functionality that differs with each payload. One payload’s ESTABLISH could be using TCP connections, while the other uses ICMP packets. By using a common vocabulary of words, we can then have lower level implementations that differ dramatically in execution. This allows us to have the same higher level logic across different platforms and operating systems, but still have the close-to-metal control. Not only do we gain the equivalent of a cross platform macro assembler, we get much heavier code density, allowing us to fit more functionality into a smaller space.
As Forth is a dynamic language, we can redefine words on the fly. This would allow us to inject our Forth virtual machine as shellcode, then establish communications back to the host. Once we have communications, we leverage the embedded philosophy of having the debugger and compiler on the host side. We can then modify the environment to fit our current needs and interactively control the remote side. The host would run a much larger version of our tiny Forth environment, and do the assembly for our remote drone. The vocabulary on the target would be redefined as we add more words, and then we can use these words in an interactive fashion.
In Conclusion
With the lessons that embedded programming had to offer us in the form of Forth and remote control of a minimal target, we gain much more functional and portable shellcode. By defining a common vocabulary of optimized Forth words in assembler, we gain enormous amounts of flexibility in mixing and matching needed functionality that is not otherwise available. We also gain the valuable concept of modifying the target environment on the fly from a more capable host due to the exigencies of minimal target platforms from the embedded world.
Thanks and Credits
Thanks go to Scott Dunlop and Jim Burnes for being my bouncing board and offering feedback for this project.


Add New Comment
Viewing 9 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Add New Comment
Trackbacks