Fandom

Scratchpad

Reverse Engineering Mentoring Lesson 002

215,884pages on
this wiki
Add New Page
Discuss this page0 Share

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.

Now compile this second example (rem002.c) with the same command:

PATH=%PATH%;c:\Borland\BCC55\Bin
bcc32 -Ic:\Borland\BCC55\Include -Lc:\Borland\BCC55\Lib rem002.c
main(int argc, char **argv)
{
   int a;

   a = 1;
}

Disassemble it with IDA Pro. What's special about the disassembled code?


Put your answer here:

On compiling I got these responses which is ok (I had already named it test1.c instead of rem002):

Borland C++ 5.5.1 for Win32 Copyright (c) 1993, 2000 Borland
test1.c:
Warning W8070 test1.c 5: Function should return a value in function main
Warning W8004 test1.c 5: 'a' is assigned a value that is never used in function main
Warning W8057 test1.c 5: Parameter 'argc' is never used in function main
Warning W8057 test1.c 5: Parameter 'argv' is never used in function main
Turbo Incremental Link 5.00 Copyright (c) 1997, 2000 Borland


Here is what I see and does not appear different then the first compiled program even though we put in an integer:

.text:00401150 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
.text:00401150 
.text:00401150 ; Attributes: bp-based frame
.text:00401150 
.text:00401150 ; int __cdecl main(int argc,const char **argv,const char *envp)
.text:00401150 _main           proc near               ; DATA XREF: .data:004090D0�o
.text:00401150 
.text:00401150 argc            = dword ptr  8
.text:00401150 argv            = dword ptr  0Ch
.text:00401150 envp            = dword ptr  10h
.text:00401150 
.text:00401150                 push    ebp
.text:00401151                 mov     ebp, esp
.text:00401153                 pop     ebp
.text:00401154                 retn
.text:00401154 _main           endp
.text:00401154 
.text:00401154 ; ---------------------------------------------------------------------------

--Pand0ra 00:39, 10 February 2007 (UTC)


The code is the same because we are using an optimizing compiler [1]: the compiler detected that assigning a value to a variable without using that variable later on in the code has no effect, so the variable assignment is not compiled into the executable code. Use the -Od option to disable all optimization and recompile:

PATH=%PATH%;c:\Borland\BCC55\Bin
bcc32 -Od -Ic:\Borland\BCC55\Include -Lc:\Borland\BCC55\Lib rem002.c

Before you open the recompiled executable with IDA Pro, you should know the following. IDA Pro saves the disassembly in a database with file extension .idb, you see this dialog when you save or close:

Rem001-07

When you open an executable that you've already analyzed with IDA Pro, IDA Pro will ask you if you want to reuse the previous analysis (Load existing) or start a new (overwrite):

Rem002-01

In our case, we are analyzing a new executable (we recompiled it), so we select overwrite.

Include the new IDA Pro disassembly here:

--Didier Stevens



Ok so now this is what I see:

.text:00401150 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
.text:00401150 
.text:00401150 ; Attributes: bp-based frame
.text:00401150 
.text:00401150 ; int __cdecl main(int argc,const char **argv,const char *envp)
.text:00401150 _main           proc near               ; DATA XREF: .data:004090D0�o
.text:00401150 
.text:00401150 var_4           = dword ptr -4
.text:00401150 argc            = dword ptr  8
.text:00401150 argv            = dword ptr  0Ch
.text:00401150 envp            = dword ptr  10h
.text:00401150 
.text:00401150                 push    ebp
.text:00401151                 mov     ebp, esp
.text:00401153                 push    ecx
.text:00401154                 mov     [ebp+var_4], 1
.text:0040115B                 pop     ecx
.text:0040115C                 pop     ebp
.text:0040115D                 retn
.text:0040115D _main           endp
.text:0040115D 
.text:0040115D ; ---------------------------------------------------------------------------

Ok, now I see the Base Pointer Register moving the integer (or at least that is what I am assuming it is). What is the var_4 and what are the dword ptr's?

On a side note, why is the .exe so large (46kb) when there is very little code?


Auto variables in C are stored on the stack. What you see in the disassembled code is the assignment of 32-bit integer value 1 to auto variable a, which is on the stack. To pass function arguments and store auto variables in the stack, the compiler sets up a stack frame. Read about stack frames here: [2] . BTW, if you watch attentively IDA Pro's autoanalysis process, you will remark that var_4 is not present at first and appears later during the analysis.


.text:00401150 var_4           = dword ptr -4
.text:00401150 argc            = dword ptr  8
.text:00401150 argv            = dword ptr  0Ch
.text:00401150 envp            = dword ptr  10h

var_4, argc, argv and envp are assembly labels, they are not real machine instructions. Labels are a type of assembler directives. Assembler directives instruct the assembler to operate in a certain way. Labels are just mnemonics for values. In our case, var_4 is equal to -4. So each time the assembler finds var_4 in the assembly code, it will substitute var_4 with value -4. And that's how you have to read it to. We’ll look into argc, argv & envp in a later example.

.text:00401150                 push    ebp
.text:00401151                 mov     ebp, esp
.text:00401153                 push    ecx

These are assembly instructions to set up the stack frame.

1. The ebp register is pushed to the stack to save its content

2. The content op the esp register (the esp register contains the stack pointer) is copied to the ebp. This overwrites the current value of the ebp register, but we will restore it at the end from the stack.

3. The ecx register is pushed to the stack, but not to save it's content!

In the stack frames document I referenced, you've read that the entry sequence ends with a 'sub esp, X' instruction, where X is the total size, in bytes, of all automatic variables used in the function. This is to reserve space on the stack for the auto variables. So in our case, one auto variable with a size of 4 bytes, we expect to see this: 'sub esp, 4'. But we don't, we see 'push ecx' instead. This is a trick of the compiler: the effect of 'push ecx' on the esp register is the same as 'sub esp, X', pushing a register also decreases the stack pointer with 4. The compiler uses 'push ecx' because this instruction is smaller (only 1 byte compared to 3 bytes for the sub instruction), and is also probably faster (to be sure, you should look up the number of execution cycles each instruction takes).


.text:00401154                 mov     [ebp+var_4], 1

This is the assembly instruction for the C statement 'a = 1' It stores 1 into the memory location pointed to by ebp+var_4. ebp contains the value of the stack pointer esp before space was reserved for the auto variables, so in fact, ebp always points to the top (because the stack grows downwards) of the memory reserved for the autovariables. And the first 4-byte space reserved from the top of the memory is used to store auto variable a.

These are examples of different addressing schemes:

mov     ebp, 1		: set register ebp equal to 1
mov     [ebp], 1	: set the memory pointed to by register ebp equal to 1
mov     [ebp-4], 1	: set the memory pointed to by register ebp - 4 equal to 1


.text:0040115B                 pop     ecx
.text:0040115C                 pop     ebp

These are assembly instructions to dispose of the stack frame.

1) The ecx register is poped from the stack, this is done to increase esp with 4

2) The ebp register is poped from the stack, this is to restore its content

.text:0040115D                 retn

We return to the calling function.

In the next lesson, to help you understand this, we will step through this code with a debugger.

The reason the file is 46k is because it contains library functions, and other environment setup code. And the PE file format also provides for data, like strings. We’ll look at this later.

While you wait for me to publish the debugging lesson, try some programs with several variables, like these:

main(int argc, char **argv)
{
   int a;
   int b;

   a = 1;
   b = 2;
}
main(int argc, char **argv)
{
   int a;
   char c;

   a = 1;
   c = 2;
}

--Didier Stevens

Cool beans. Kinda figured ahead of time what the results would be like. Though I don't understand where "add esp, 0FFFFFFF8h" came from. It seems like it is associated to var_8 though I don't understand why. I say this because when I right-click on it in IDA it allows me to switch it to the var_8 or convert it to bin, dec, etc.

.text:00401150 ; int __cdecl main(int argc,const char **argv,const char *envp)
.text:00401150 _main           proc near               ; DATA XREF: .data:004090D0�o
.text:00401150 
.text:00401150 var_8           = dword ptr -8
.text:00401150 var_4           = dword ptr -4
.text:00401150 argc            = dword ptr  8
.text:00401150 argv            = dword ptr  0Ch
.text:00401150 envp            = dword ptr  10h
.text:00401150 
.text:00401150                 push    ebp
.text:00401151                 mov     ebp, esp
.text:00401153                 add     esp, 0FFFFFFF8h
.text:00401156                 mov     [ebp+var_4], 1
.text:0040115D                 mov     [ebp+var_8], 2
.text:00401164                 pop     ecx
.text:00401165                 pop     ecx
.text:00401166                 pop     ebp
.text:00401167                 retn
.text:00401167 _main           endp

--Pand0ra 20:18, 19 February 2007 (UTC)

Remember what I explained about stack frames: In the stack frames document I referenced, you've read that the entry sequence ends with a 'sub esp, X' instruction, where X is the total size, in bytes, of all automatic variables used in the function. You've 2 auto variables (2 32-bit integers), they require 8 bytes on the stack. So you would expect to see this:

.text:00401151                 mov     ebp, esp
.text:00401153                 sub     esp, 08h
.text:00401156                 mov     [ebp+var_4], 1

but you see this:

.text:00401151                 mov     ebp, esp
.text:00401153                 add     esp, 0FFFFFFF8h
.text:00401156                 mov     [ebp+var_4], 1

Now in fact, 0FFFFFFF8h is the Two's Complement [[2]] representation of -8. So you have "add esp, -8", which is equivalent with "sub esp, 8".

--Didier Stevens 19:41, 20 February 2007 (UTC)

I see that the var changed when using a char instead of a int for the second example. Not sure why it is var_5 instead of var_8 other then it has to do with the type of value.

.text:00401150 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
.text:00401150 
.text:00401150 ; Attributes: bp-based frame
.text:00401150 
.text:00401150 ; int __cdecl main(int argc,const char **argv,const char *envp)
.text:00401150 _main           proc near               ; DATA XREF: .data:004090D0�o
.text:00401150 
.text:00401150 var_5           = byte ptr -5
.text:00401150 var_4           = dword ptr -4
.text:00401150 argc            = dword ptr  8
.text:00401150 argv            = dword ptr  0Ch
.text:00401150 envp            = dword ptr  10h
.text:00401150 
.text:00401150                 push    ebp
.text:00401151                 mov     ebp, esp
.text:00401153                 add     esp, 0FFFFFFF8h
.text:00401156                 mov     [ebp+var_4], 1
.text:0040115D                 mov     [ebp+var_5], 2
.text:00401161                 pop     ecx
.text:00401162                 pop     ecx
.text:00401163                 pop     ebp
.text:00401164                 retn
.text:00401164 _main           endp

--Pand0ra 21:30, 19 February 2007 (UTC)

The name of the label var_5 is not important here, you can rename labels if you want to (select a label and press N).

A char variable in C is 1 byte long, that's why the label is a byte pointer and not a dword (double word, or 4 bytes) pointer:

.text:00401150 var_5           = byte ptr -5

The following move instruction:

.text:0040115D                 mov     [ebp+var_5], 2

sets exactly 1 byte in memory equal to 2, because var_5 is a byte pointer.

The following move instruction:

.text:00401156                 mov     [ebp+var_4], 1

sets 4 bytes in memory, 3 of them equal to 0 and the last one equal to 1, because var_4 is a dword pointer.

This will become clearer with the debugger in Reverse Engineering Mentoring Lesson 003, where you can see the code in action.

--Didier Stevens 19:41, 20 February 2007 (UTC)

--New Visitor 5/4/2007

Q: Why, in the second example using char c, does the assembly still show it reserving8bytes on the stack when sizeof(c) is one byte?

A: For efficiency. 32-bit processors work best when accessing memory 32 bits (4 bytes) at a time. To optimize for performance, the stack is accessed in 4 bytes blocks, even if the data is only one byte.

Also on Fandom

Random wikia