Hacking Firmware Tips

More a shameless copy/paste compilation of some articles (in case they vanish from the internet)

Some tips

Tip 00x01 - I normally like to look for bootcode if I have the choice. If not, I just deal with what I have.
Tip 00x02 - Firmware files are in hex and have a ton of unreadable data in them. I am interesting in about 1% of the file which contains ASCII text. The first thing I do is run firmware thru the *Nix command; Strings. Strings is a command that will print out any ascii character sequences that is followed by a an unprintable character. The default character limit is 4 but you can change that. A example command would look like this:

strings -t firmwareName.bin

Tip 00x03 - Some firmware will be compressed in what is called ZLIB compressed chunks. In outsourced code larger then 3Meg this is very common. There is a great tool called DeeZee which is part of the Black Bag Tool Kit from Matasano. It is older but works really well still for binary dissection. DeeZee will search thru a binary file for ZLIB signatures then extract them and print out the results. Human behavior is such that we write and comment stuff out all the time. Look at the best practices for a simple ACL. If I run a file thru strings or view it in a hex editor and see nothing but unreadable crap, then I assume it must be ZLIB'ed or encrypted but that is very rare. I run it thru DeeZee, with the command:

./deezee firmwarename.bin

Quick 'n Dirty Firmware Reversing

1. In Which I Get The Image

I start out with a couple megs worth of firmware image file. This file is sent to the device when you do a firmware upgrade the official way. You can usually find firmware images on the vendor’s download site, or on the CDs that came in the box.

2. In Which I Scan The Image

I feed it to deezee. Deezee is part of Matasano’s homegrown “Black Bag” toolkit. What deezee does is search through a binary file for compressed data by looking for zlib signatures, and then extracting anything it finds. Its crazy how often this seems to work on unknown binary files. As it turns out, zlib is the industry standard compression format, even for toasters.

Sure enough, deezee finds not one, but 3 distinct blobs of compressed data packed in the file. Hmm… really… deezee should tell me where it finds these things I think. Lets fix that now. Much better.

3. In Which I Read

With addresses for compressed chunks in hand, I open up the file in my hex editor.

Deezee found the first compressed segment around 384 bytes from the beginning of the file. Here’s where technique comes into the picture.

Deezee found three “hits” in the image. I assume they aren’t just sprinkled haphazardly throughout the image; there should be some kind of header on them. I want to know what that header looks like. Maybe it’s used for things besides compressed blobs? The way I’m going to do that is, I’m going to compare the bytes surrounding the compressed blobs for all three hits.

So, comparing the preceding 384 bytes of that and the other two hits, I see several similarities:

A 4-byte signature. I can tell because it’s always the same four bytes, in the same place relative to the blob.
An ASCIIZ string, apparently describing the firmware version, padded to 128 bytes. You see “padded strings” all the time in binaries; what they are is a C-struct with a “char version128”.
A 16-byte ASCIIZ string of numbers describing just the version, again padded with NULs.
4 bytes which, when unpacked as an integer, happens to match the size of the compressed chunk. (If your hex editor won’t tell you the big and little endian 32 bit value of four highlighted bytes, get a new hex editor). This is the trick that cracks most image formats: you look for 16 or 32 bit fields that look like lengths, and try to reconcile them to the file and its features.
4 bytes which… hmm… could be CRC32 checksum? This is a bit of a shot in the dark, but it’s easy enough to check:

$ cat app.bin | ruby -e &#92;
'require "zlib"; puts Zlib.crc32( STDIN.read ).to_s(16)'
a76ea2ad

And… they match! I’ll need to keep this checksum in mind when I try modifying the file. Bodes well for no code signing!

Why did I try a CRC32 checksum first? Well… first off I assumed the field was 32 bits long. And CRC’s (Cyclic Redundancy Check) are used a lot in file formats that encapsulate other files. As you can see above, I used Zlib’s crc32 function to check my file chunk. Could just easily have been OpenSSL’s since it too is a standard Ruby library. Two shining examples of how common CRC32 is.

4. In Which Other Headers Are Found

Now that I know what to look for, I find out that this header also prefixes some other chunks. Ones that aren’t compressed and so weren’t noticed by deezee. I also confirm that CRC32 header field for each chunk and it’s used on the other’s too. Five total, including the compressed ones. The last one is a big chunk of base64 which looks like it might be an encryption signature. Hmm… might be used for code signing somehow? Looks like I still need to confirm or discount this possibility.

5. In Which I Check For Metadata

Go back and look again at the original chunks of compressed data. The zlib headers in the file included the original filenames which give me some clues as to what each is. Going on filenames, I’m thinking I’ve got phase one and two boot-loaders and the third chunk is the actual app. I’m interested in the application. The Unix binutils ‘file’ command doesn’t tell me anything useful about any of them, but it’s always worth a shot.

6. Behold: Low Hanging Fruit

Strings the application file using GNU strings with ‘-t x’ so I get nice hex offsets for the where the strings are found. Lots of interesting stuff:

Uh… well there’s this: Nice ASCII art. Just a wild guess, but I think maybe this is VxWorks?
The regular string ‘vxworks’ shows up in a lot more places too.
“GNU ld version 2.9-mips3264-010729”. Ok so it’s MIPS. This is also good news. It means that somewhere out there, there’s probably a GNU binutils distribution that can understand this file format.
Lots of what looks like function names close together. When do function names occur in compiled output? Two reasons: debugging code, or a symbol table. A symbol table would be a score; I make a note of that for later.
Lots of assertion messages with references to source code line numbers xxx.c(###). Always handy. Even if you don’t have symbols for functions, with an hour of data entry in your disassembler you can usually fake them based on debug and assertion strings. You want to take the time to do this. By the time you’ve guessed even 10% of the symbols in an image, you’re usually going to be able to comprehend 90% of what the image does, without reading any assembly.
The strings you see when the device boots up.
Lots of error, exception, and status messages such and one typically finds in a program. Sometimes very handy as we’ll soon see.

7. In Which Firmware Is Patched

I’ve got a pretty good idea of what the program is and how it’s rolled up in the firmware image file.

My job right now is to figure out whether I can change the firmware and load it onto the toaster. So I make a simple change to it: there’s a string which gets displayed when the system boots up. Using my hex editor, I change it to something noticeably different, being careful to keep it the same size as the original. Then I re-compress it at the same level as the original, re-roll it into the header format, and change the CRC32 checksum.

I load it onto my target using the bug I found… and… sweet satisfaction! If there’s any code signing here, it doesn’t work. That spooky looking signature at the end is for something else. Knowing this, It’s well worth the effort to keep reversing the image.

8. In Which A Loader Offset Is Sought

Now, the binutils ‘file’ command didnt recognize the compressed image, nor did it recognize either of the apparent “boot-loaders”. If it was a well-known format like ELF or COFF, it would have.

Unfortunately, the executable headers are what tell us how to load the whole program into memory. Without a known executable format, I’ll need to find out the load offsets myself. If I don’t, when I load the program into a disassembler, none of the data and instruction offsets will make sense. You can read a binary image with broken offsets, but it’s not fun.

Headers are usually located at the beginning of the file. But I’m not seeing any strong indications of any sort of header information at all in this one. My guess: an older version of VxWorks, predating VxWorks ELF. On the bright side, firmware images aren’t usually relocatable. Unlike a Windows PE program, which is literally edited and patched up by the linker at runtime, firmware tends to be loaded at a specific address.

I could try reversing either or both of the bootloaders to find that address, but… I might just run into the same problem walking all the way back up to the first boot-loader. To be honest, I’m not really interested in unlocking the secrets of the toaster boot loader. If I was trying to jailbreak the toaster to heat unauthorized off-brand Pop Tarts, maybe I would be. Exercise for the reader.

9. In Which I Infer A Loader Offset

I’ve got another idea. Lets look closer at some of these strings and their surrounding data. I scroll around in my hex editor around the sections with strings until I’ve found what I’m looking for.

What we’re looking for is a list of strings preceded by their address table. At this point, I really don’t care what the strings are, just that they are recognizable text. The addresses prefixing the strings in the table are 32-bit big endian numbers pointing to the ASCIIZ strings. Only parts of the addresses match up to real addresses in the file, though, and this is key.The first entry (blue) points to 0x104DEDD8. The string actually lives at 0x004D9DD8 in the file. Subtract the two and you get 0x10005000 - the load offset for this segment.

If I’m unsure of my assumptions here, I check some of the other entries in the table (red, green, and brown). Same pattern holds. We’ve got a winner.

10. In Which I Sanity Check The Offset

Search for 0x10005000 and 0x5000 near the beginning of the file. I want to confirm my assumptions about the lack of a header. Nothing comes up. I’m still very certain this is my winner. The lack of a search hit here suggests again that the loader decides where to load everything, though this program’s compiler knew where that would be when it was compiled too.

11. In Which I Dissassemble

We are now at step 11 and it is time to load the file in a disassembler.

We use IDA Pro. IDA wants to know the architecture of the file. We choose a processor type of “mipsb”; that’s MIPS, which we learned from scanning the image format for strings, and running in big-endian mode.

How do we know it’s big endian? Big endian is a safe guess for non-X86 architectures, but in this case it’s more than a guess: the addresses and lengths in the file were big endian. What if we hadn’t seen a string identifying it as MIPS? I’d probably take a fragment of the file and feed it through “objdump -d” multiple times, specifying MIPS, PowerPC, ARM, and SPARC, in that order. You know you have a match when the instructions sort of make sense.

I let it load as one big segment. Once loaded in IDA, I go to “Edit -> Segments -> Rebase Segment” and enter the address I came up with in step #9. Rebasing tells the disassembler what the load offset is.

Here I have to tell you about Thomas’ irrational fear of IDA’s rebasing. According to Tom this is based on a bad experience rebasing a 10 megabyte firmware image, waiting a day for it to finish, and only then noticing he got the address wrong. Here’s a handy trick Thomas uses instead of rebasing: feed the file to binutils “objcopy”. Objcopy is a best-kept-secret for reversing: you can feed it a file of type “binary”, and tell it to spit out an ELF file, specifying the header values. Several other tools suck at handling raw files, but rock at handling ELF. Why is his fear irrational? Well… I just disable IDA’s auto-analyze feature before I rebase. Then I do some manual poking around first to make sure I’m on the right track before turning it back on again. But… nobody tell Thomas this! He always comes up with crazy cool tricks using other tools when he gets pissed off at IDA.

12. In Which Work Out A Symbol Table

Based on where I found strings in the image, I have a pretty good idea of the general region of memory where program data lives. Remember those function names I filed for later? Now is later.

I take the addresses of a few function name strings I find close together and search the file for the four bytes making up their (newly rebased) addresses. I’m consistently getting search hits near the end of my probable data segment, all in the same general area. Look around there and start converting every 4-byte aligned chunk starting with 0x104 to an offset. I start seeing offsets to strings of what looks like function names at 16-byte intervals. The string offsets are right next to offsets pointing way back into the code area and some pointing into the data area. I look a little closer at the hex dump of this area:

Blue is what points to the “name” of the symbol, red points to the actual thing in memory. The last column is the “type” of thing. Green things (0x500) are functions, and purple (0x700) are data. There’s also some oddballs strewn about with a type of 0x900. These point way out of bounds past the end of my rebased file. Could be a segment I don’t know about from this file, or it could just be something else created at run-time. I don’t stress about this, stick with what I know for sure right now.

I take this pattern and translate it into a structure for IDA, then find where the table begins and ends based on the pattern. This is an array, so that’s how I define it in IDA. The last element of the array is 16 bytes of 0x00 helping me (and IDA) see where it ends.

13. In Which I Import The Table Into IDA

Things start getting recognized quickly by IDA’s auto-analysis once I tell it about all these symbol offsets. But I also want IDA to know the names of everything in the symbol table.

I write a quick Ruby script to write an IDC script to do this work for me. IDC is the scripting language built into IDA; in the 21st century, most people write their IDA tools in Python or Ruby, but it’s sometimes faster to write short scripts that use IDC as a half-way format.Run ruby… run IDC… take a look at the results. Awesome! Just about everything was identified in this symbol table! Furthermore, everything was identified in the same segment off of one load address.

Just to make sure about that missing header I check my final results from the symbol table. Indeed, two of the function symbols point right at 0x10005000, the beginning of the program. One’s called ‘sysInit’ and the other ‘start’. There can’t be a header there, it’s a function. Though I can tell as much from the name “start”, a little googling and research tells me “sysInit” is what VxWorks usually uses as its program entry point.