We will start a series of lessons were we compile some very simple C programs and disassemble them.
Download these:
- Borland's free C++ 5.5 compiler - https://downloads.embarcadero.com/item/24778 (Free Registration Required)
- IDA Pro Freeware version 4.3 - http://www.hex-rays.com/idapro/idadownfreeware.htm
- Install the compiler and the disassembler with the default options, but don't start the disassembler IDA Pro yet.
Compile this empty C program (rem001.c) with Borland's C++ compiler:
main(int argc, char **argv) { }
You will need the BIN directory of the Borland C++ compiler in your PATH:
PATH=%PATH%;c:\Borland\BCC55\Bin bcc32 -Ic:\Borland\BCC55\Include -Lc:\Borland\BCC55\Lib rem001.c
This will produce an executable rem001.exe
Start IDA Pro & click OK:
Accept the EULA:
Select New:
Open the rem001 executable:
Click OK:
Wait for the end of the autoanalysis:
IDA Pro should show you this, starting at .text:00401150:
.text:00401150 ; int __cdecl main(int argc,const char **argv,const char *envp) .text:00401150 _main proc near ; DATA XREF: .data:004090D0 .text:00401150 .text:00401150 argc = dword ptr 8 .text:00401150 argv = dword ptr 0Ch .text:00401150 envp = dword ptr 10h .text:00401150 .text:00401150 push ebp .text:00401151 mov ebp, esp .text:00401153 pop ebp .text:00401154 retn .text:00401154 _main endp
.text is the section of the PE file (Portable Executable) format where the Borland compiler puts the code. Read here (http://en.wikipedia.org/wiki/Portable_Executable) about PE and also the referenced MSDN articles.
The first line is a comment, indicated by ;
_main is a label, used to reference the address of the main function, 00401150 near indicates that the referenced address is in the same section (.text) argc, argv and endp are constants
The first assembly instruction that produces code is push ebp, it is one byte large. This will push the ebp register on the stack. You should read about assembly, look at the books referenced here (http://en.wikipedia.org/wiki/Assembly_language). For now, read about CPU architecture, registers and the stack [1]. Pushing EBP on the stack saves it for later use, so that EBP can be used for other things. Retrieving it from the stack is done with pop ebp (address 00401153). mov ebp, esp will copy the content of register esp to register ebp. esp is the stack pointer. and retn terminates the main function
You'll allways find this code at the beginning of functions:
.text:00401150 push ebp .text:00401151 mov ebp, esp
It's the prologue, it saves and sets up the registers to start executing the actual code
And this is the epilogue:
.text:00401153 pop ebp .text:00401154 retn
You'll find it after the actual code, it restores the registers and returns.
There is no actual code to execute since our main function is empty.
Questions: Type your questions here.
Q: Does IDA always point to the beginning of a program post-analysis?
A: We usually speak about entry points: this is the real start of the program. IDA Pro will point to the entry point, unless it finds another start structure like the main function in a C program.
Q: Are the contents of ebp cleared when it is popped?
A: This pop instruction will retrieve a 32-bit value from the stack and store it in register ebp. Previous value of ebp is overwritten.
Q: What is the significance of values 8, 0ch, 10h assigned to argc, argv, envp respsctively?
A: They can used as offset value (to ebp) to access the value of argc, argv and envp. As these arguments are not referenced in the C code, no real assembler code was emitted that uses these constants.