"Learning Notes on IDA Reverse Engineering from Scratch - 14 (Introduction to Program Unpacking)"

What is packing?

This chapter demonstrates the unpacking of UPX packed programs.

Packing refers to the technique of hiding the executable code of a program through compression or encryption, in order to prevent easy reverse engineering. Packing involves adding additional sections (STUB) to the program, which decrypt the encrypted file and save it to another section in memory, or create the original sections of the program, and then jump to the decrypted code for execution.

Most packers protect the file by tampering with the Import Address Table (IAT) and the file header (HEADER). They also add anti-debugging code to prevent the original file from being unpacked.

Use "die" to check if the program is packed.

The image above shows that the program is packed with UPX version 3.91, and it is a 32-bit program with i386 architecture.

Loading the packed file

When loading the packed file, uncheck "Create Input Section" and check "Manual Load".

After clicking OK, a window will pop up, click OK.

The image above shows the entry point of the packed program.

The image above shows the entry point of the original program.

The entry point of the packed program is at address 0x409BE0, while the address of the original file is 0x401000.

File and memory usage

Comparing the sections of these two files, there is an additional section called "upx0" below the file header in the packed file, which occupies more memory than the other sections in the original file.

The image above shows the sections of the original file.

The image above shows the sections of the packed file.

The upx0 section in the packed file ends at 0x409000, while the sections below the header in the original file range from 0x401000 to 0x408200. When a program is executed, it may only occupy 1k on the hard disk, but it may occupy 20k or more in memory.

As shown in the image above, the starting address of the CODE section in the original file is 0x401000, the size of the section in the file is 0x600 bytes, and the virtual size in memory is 0x1000 bytes.

Moving to the packed file, as shown in the image above, the starting point of the upx0 section is 0x401000, the size of the section in the file is 0, but it occupies 0x8000 bytes in memory. The program occupies enough space here to store the original program code and then jump to execute it.

The image above shows the jump at 0x401000 in the packed file.

The dword_ before 0x401000 represents the data type DWORD, "?" indicates that it only occupies a memory position without saving any content, and dup indicates 0xc00 dwords, which is 0x3000 bytes. 0x404000 also occupies 0x1400 bytes.

So a total of 0x8000 bytes are used to store the content of the original code.

As shown in the image above, at 0x401000, press "x" to see that there are two references here (we will come back to this later).

The image above shows the references to executable code.

The upx1 section occupies 0xe00 in the file and 0x1000 in memory.

The image above shows the file and memory usage of the upx1 section.

The program may have used some simple encryption to hide the original code, and there are several references to the starting point of the upx1 section at 0x409000.

The image above shows the references to 0x409000.

One of the references comes from the executable part below, click to jump to that location.

The image above shows the program entry.

Stub and OEP

In the stub after the program entry shown in the image above, the ESI register is passed the address 0x409000. As shown in the image below, the executable code is located below the packed code of the original file, and they both belong to the upx1 section. Therefore, in the upx1 section, there are the encrypted content of the original file and the stub code after 0x409be0.

The image above shows the traced executable code.

In the upx0 section, there is a reference as shown in the image below.

The image above shows the reference at 0x401000.

In the image below, there is an unconditional jump to 0x401000, which is the reference at 0x401000 in the previous image.

The image above shows the jump at 0x401000 in the packed file.

"jmp near" is an instruction that directly jumps to the specified number of addresses. So after the stub completes the decryption and generates the original code, the program will jump to 0x401000 (Original Entry Point or OEP), which is the original entry point of the program (where the program starts executing). The corresponding stub entry point is 0x409be0.

The original program entry point is called "ORIGINAL ENTRY POINT" or "OEP". In the case of a packed program, it is not possible to know its specific location, but in this program, the OEP is indeed 0x401000.

Finding OEP

In most cases, it is not possible to obtain the original program, so it is not possible to directly obtain the OEP address of the packed program. Next, we will introduce how to find the OEP.

When the STUB completes the decryption and generates the original code, it will jump to execute the program. Generally, the first instruction executed in the section is the OEP.

Set a breakpoint before entering the OEP to check if the original program has been generated before reaching this point.

Select "Local Windows Debugger" for debugging and start debugging. Run to the breakpoint.

The program runs to the breakpoint and jumps, press F8 to step through.

A warning message pops up indicating that the upx0 section was originally interpreted as data, click "Yes" to interpret it as code.

Now the program has decrypted the original code and jumped to execute it. The code here is very similar to the code at 0x401000 in the original program. However, since it is not defined as a function (loc_401000), it cannot be switched to the graph view. But this can be automatically achieved.

In the lower left corner of the IDA interface, there is a hidden menu. Right-click and select "Reanalyze Program". Now go back to 0x401000 and display sub_401000, which indicates that it is a function. Press the space bar to switch back to the graph view.

After reanalysis, the code at 0x401000 is recognized as a function.

So far, two methods for finding OEP have been introduced, creating a memory snapshot of the decryption code. The next step is to DUMP and rebuild the IAT to obtain a unpacked and runnable program.