disasembly of some function

Started by
14 comments, last by fir 10 years, 1 month ago
when i compile


void ERROR_EXIT(char* text)
{
       MessageBox(NULL, text, "error" ,  MB_OK | MB_ICONEXCLAMATION);
       exit(-1);
}
with mingw and disasembly it with (objconv by agner fog, i liked it so im using it as a disasembler sometimes) it give


 
 


ALIGN   4
 
__Z10ERROR_EXITPcS_:
        sub     esp, 28                 ; 0040217C _ 83. EC, 1C
        mov     dword [esp+0CH], 48     ; 0040217F _ C7. 44 24, 0C, 00000030
        mov     eax, dword [esp+24H]    ; 00402187 _ 8B. 44 24, 24
        mov     dword [esp+8H], eax     ; 0040218B _ 89. 44 24, 08
        mov     eax, dword [esp+20H]    ; 0040218F _ 8B. 44 24, 20
        mov     dword [esp+4H], eax     ; 00402193 _ 89. 44 24, 04
        mov     dword [esp], 0          ; 00402197 _ C7. 04 24, 00000000
        call    _MessageBoxA@16         ; 0040219E _ E8, 000034DD
        sub     esp, 16                 ; 004021A3 _ 83. EC, 10
        mov     dword [esp], -1         ; 004021A6 _ C7. 04 24, FFFFFFFF
        call    _exit                   ; 004021AD _ E8, 000033B6
 


i do not understand this 0040218B values (what it really stands for? some previous examples seem to suggest that it is real process memory adress where this code will be laying in 9after loading to ram) but i am not sure
- also I do not understand what adresses like 000033B6 mean, why they are
different?
also what means dot after 89. etc means is mystery for me
could someone explain it a bit maybe?

also what means @16 at the and of _MessageBoxA@16 ?

Advertisement

I'll try and answer what I know.


i do not understand this 0040218B values (what it really stands for? some previous examples seem to suggest that it is real process memory adress where this code will be laying in 9after loading to ram) but i am not sure

On Windows, the address 0x00401000 is a pretty standard address for the main entry point of your program (if no executable packing or other obfuscation has been done). It is however not global to the system, it is relative to the starting point of your program space allocated by the underlying OS, and unsurprisingly starts at 0x00000000.


also what means @16 at the and of _MessageBoxA@16

It is a calling convention of Window's STDCALL. The function being called is always name decorated with a leading underscore, followed by an @, and then the number (in bytes) of arguments passed on the stack. This number will always be a multiple of 4 on a 32-bit aligned machine.

In your example, @16 means that 4 arguments are passed to _MessageBox on the stack. If you were to disassemble the function _MessageBox, you will find the return statement:


ret 16

which will pop the 4 arguments off the stack again before returning.

As to the other questions, I can only make educated guesses.

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

The big numbers starting with 004 are the memory adresses within the code segment. Note that code can be relocated uppon loading.

The smaller numbers like 34DD and 33B6 are offsets. E8 is a call with a relative offset so those numbers get added to the adress of the instruction following the call instruction to get the adress of the target of the call.

What the dot after the opcode means? No clue :-/

The @16? This is a byproduct of C++ called name mangling. In C++ you can have several functions with the same name, which only differ in the number of parameters (or in the case of methods in the classes they belong to). In order to distinguish those, the compiler adds some magic numbers and characters to the function names, usually based on

the number and type of parameters. The specifics however are different for each compiler.


The @16? This is a byproduct of C++ called name mangling. In C++ you can have several functions with the same name, which only differ in the number of parameters (or in the case of methods in the classes they belong to). In order to distinguish those, the compiler adds some magic numbers and characters to the function names, usually based on

the number and type of parameters. The specifics however are different for each compiler.

This is incorrect, see my post above.

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

alright, tnx for the answer

I do not understood though the 0x00401000

Is this just the adress in virtuall process sace

where it will be loaded? What with the first 4MB

of adress space skipped

The 0x004XXXXX after the semicolon is the virtual address of that instruction. The 4MB skip seems to be some kind of tradition, maybe to guarantee that you will get a segmentation fault when you try to access a struct/class member of a pointer to null.

after the "_" is hex dump of the instruction:

Bytes before the "." is the main opcode of the instruction. Addtional bytes before "," specifies how the instruction interacts with other data, like read from which register or dereference the memory address in some register with some offset. the rest after "," is additional argument for the instruction.

so for:


mov     dword [esp+0CH], 48     ; 0040217F _ C7. 44 24, 0C, 00000030

0040217F is the address of the instruction, you could use "jmp 0040217F" to make it the next instruction to execute.

"C7." means "mov", note there are multiple forms of "mov" with different opcode to move different stuff around. Though I don't know why the disassembler put the "." there, C7 itself is not complete.

"C7. " combined with "44 24," means "move a const value into the address in esp plus some offset"

"0C" is the offset

"00000030" is the const value to move in [esp + 0x0C].

stuffs after the ";" is comment used as reference of the corresponding machine code, normally you don't need to read it unless you are doing some really low level stuff or trying to make a assembler/disassembler.

If you want to know more about how to translate assembly into machine code you could read the manuals at

http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

x86 machine language is really, really complicated, you might want to not go further than assembly.

Or, you can have some fun with

http://www.cppgm.org/

in Programming Assignment 9 you are required to write a x86/64 assembler, which is both fun and painful.



The 0x004XXXXX after the semicolon is the virtual address of that instruction. The 4MB skip seems to be some kind of tradition, maybe to guarantee that you will get a segmentation fault when you try to access a struct/class member of a pointer to null.

after the "_" is hex dump of the instruction:

Bytes before the "." is the main opcode of the instruction. Addtional bytes before "," specifies how the instruction interacts with other data, like read from which register or dereference the memory address in some register with some offset. the rest after "," is additional argument for the instruction.

so for:

mov dword [esp+0CH], 48 ; 0040217F _ C7. 44 24, 0C, 00000030

0040217F is the address of the instruction, you could use "jmp 0040217F" to make it the next instruction to execute.

"C7." means "mov", note there are multiple forms of "mov" with different opcode to move different stuff around. Though I don't know why the disassembler put the "." there, C7 itself is not complete.

"C7. " combined with "44 24," means "move a const value into the address in esp plus some offset"

"0C" is the offset

"00000030" is the const value to move in [esp + 0x0C].

stuffs after the ";" is comment used as reference of the corresponding machine code, normally you don't need to read it unless you are doing some really low level stuff or trying to make a assembler/disassembler.

Alright, very much tnx, everything seem clear now

When im looking on disasembly of my program one thing that yet surprised me is that all the code and data seem to be lying in the one contagious chunk $401000 and above

at the end of the code there is yet something like


 
_VirtualProtect@16:; Function begin
.text:  jmp     near [imp_VirtualProtect]               ; 00405708 _ FF. 25, 0040A20C(d)
; _VirtualProtect@16 End of function
 
        nop                                             ; 0040570E _ 90
        nop                                             ; 0040570F _ 90
 
ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function
 
        nop                                             ; 00405716 _ 90
        nop                                             ; 00405717 _ 90
 
_GetStartupInfoA@4:; Function begin
.text:  jmp     near [imp_GetStartupInfoA]              ; 00405718 _ FF. 25, 0040A1EC(d)
; _GetStartupInfoA@4 End of function
 
        nop                                             ; 0040571E _ 90
        nop                                             ; 0040571F _ 90
 
ALIGN   16
_EnterCriticalSection@4:; Function begin
ALIGN   16
.text:  jmp     near [imp_EnterCriticalSection]         ; 00405720 _ FF. 25, 0040A1D4(d)
; _EnterCriticalSection@4 End of function
 
        nop                                             ; 00405726 _ 90
        nop                                             ; 00405727 _ 90
 
does maybe someone know whot is this, and wjhat means (d) at the end of 0040A1D4(d) etc?
I got also some strange data not belonging to my functions i wrote (im compiling in c++ mode but code is plain c )
like


___DTOR_LIST__:                                         ; byte
        db 0FFH, 0FFH, 0FFH, 0FFH, 00H, 00H, 00H, 00H   ; 00405858 _ ........
        db 00H, 00H, 00H, 00H, 00H, 00H, 00H, 00H       ; 00405860 _ ........
     and yet 30 lines or so

or
 
SECTION .eh_frame align=4 noexecute                     ; section number 4, const
 
        db 14H, 00H, 00H, 00H, 00H, 00H, 00H, 00H       ; 00408000 _ ........
 and yet 300 lines or so
 
then something like
could it be not included here if i would use some switches?


imp_ExitProcess:                                        ; import from KERNEL32.dll
__imp__ExitProcess@4:                                   ; byte
.idata$5:                                               ; byte
        db 0A6H, 0A3H, 00H, 00H                         ; 0040A1D8 _ ....
 
imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5:                                               ; byte
        db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....
 
imp_GetLastError:                                       ; import from KERNEL32.dll
__imp__GetLastError@0:                                  ; byte
.idata$5:                                               ; byte
        db 0C6H, 0A3H, 00H, 00H                         ; 0040A1E0 _ ....
 
imp_GetModuleHandleA:                                   ; import from KERNEL32.dll
__imp__GetModuleHandleA@4:                              ; byte
.idata$5:                                               ; byte
        db 0D6H, 0A3H, 00H, 00H                         ; 0040A1E4 _ ....
it all ends at 0040C1F8
yet question - it is really all code and data packed into such one contigiuus block of memory, and yet cann it be setted by some
compile time switches to be loaded into some other adress of memry than $401000-40c200 ?

Microsoft linkers are defaulted to load code into ImageBase(default 0x4000000) + CodeBase(default 0x1000), and data immediately after it. So it is not one continuous block,but two sections ".text"(for code) and ".idata"(for static data). You can change CodeBase via ProjectSettings->Linker->Advance->BaseAddress, but Microsoft seems to think there are no reasons to meddle with the ImageBase so there are no options. If you wish to change that you have to generate the executable yourself and edit the PE header.

But why change it? Because you can? This is something I really don't understand.

While compiling c or c++ machine code for codes you write is not the only thing you get. Compiler inserts all sort of things for you like the standard library or os library and setup codes. In a console application generated by MSVC, instead of calling your main() directly, it calls something like _tmainCRTSetup() first to initialize static variable, call os functions like _GetCommandLineA as you see in your disassembly to put command line arguments in argc and argv, then call your main().

So the function _GetCommandLineA must be inserted into your exe:


ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function

The actual GetCommandLineA is implemented in Kernel32.dll, so the address of that function is written in the address [imp_GetCommandLineA]


imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5:                                               ; byte
        db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....

"db 0B4H, 0A3H, 00H, 00H" is the address of that function. Since it is a function in a dll, it is probably some stub data and will be rewrite to the actual value once kernel32.dll is loaded.

When you call GetCommandLineA,the address "db 0B4H, 0A3H, 00H, 00H" is retrived by dereferencing the address imp_GetCommandLineA(0040A1DC(d)), and jumped to. as you can find in page 13 of http://www.agner.org/optimize/objconv-instructions.pdf , (d) means a direct address, sometimes you will get (rel) which means a offset from the current instruction address.

___DTOR_LIST__ seems to be something related with class destructors, c++ mode treats everything as c++, it doesn't care if your code is plain c. These are some MSVC implementation details I think you don't need to pay attention to. Normally when you disassemble c++ you only care about some certain functions, like write some stuff in main() and see how main() goes. The whole exe have a bunch of stuffs that are not worth your time.

Perhaps you should learn by looking at some win32 assembly tutorials and not by disassembling a c++ program. c++ is a real mess for starters.

Microsoft linkers are defaulted to load code into ImageBase(default 0x4000000) + CodeBase(default 0x1000), and data immediately after it. So it is not one continuous block,but two sections ".text"(for code) and ".idata"(for static data). You can change CodeBase via ProjectSettings->Linker->Advance->BaseAddress, but Microsoft seems to think there are no reasons to meddle with the ImageBase so there are no options. If you wish to change that you have to generate the executable yourself and edit the PE header.

But why change it? Because you can? This is something I really don't understand.

While compiling c or c++ machine code for codes you write is not the only thing you get. Compiler inserts all sort of things for you like the standard library or os library and setup codes. In a console application generated by MSVC, instead of calling your main() directly, it calls something like _tmainCRTSetup() first to initialize static variable, call os functions like _GetCommandLineA as you see in your disassembly to put command line arguments in argc and argv, then call your main().

So the function _GetCommandLineA must be inserted into your exe:


ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function

The actual GetCommandLineA is implemented in Kernel32.dll, so the address of that function is written in the address [imp_GetCommandLineA]


imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5:                                               ; byte
        db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....

"db 0B4H, 0A3H, 00H, 00H" is the address of that function. Since it is a function in a dll, it is probably some stub data and will be rewrite to the actual value once kernel32.dll is loaded.

When you call GetCommandLineA,the address "db 0B4H, 0A3H, 00H, 00H" is retrived by dereferencing the address imp_GetCommandLineA(0040A1DC(d)), and jumped to. as you can find in page 13 of http://www.agner.org/optimize/objconv-instructions.pdf , (d) means a direct address, sometimes you will get (rel) which means a offset from the current instruction address.

___DTOR_LIST__ seems to be something related with class destructors, c++ mode treats everything as c++, it doesn't care if your code is plain c. These are some MSVC implementation details I think you don't need to pay attention to. Normally when you disassemble c++ you only care about some certain functions, like write some stuff in main() and see how main() goes. The whole exe have a bunch of stuffs that are not worth your time.

Perhaps you should learn by looking at some win32 assembly tutorials and not by disassembling a c++ program. c++ is a real mess for starters.

This is worth the time for sure - it just need some explanation

(maybe a bit to hard to find, and then i will be seeing clearly all the disasembled binary of the whole exe - this is much worth the time) maybe even im not so much starter, I feel like I understand now 75% of it (disasembled file doubts) but still get some knowledge holes as

with this

___DTOR_LIST__:

or

SECTION .eh_frame

or

__imp__ExitProcess@4: db 0A6H, 0A3H, 00H, 00H
__imp__GetCommandLineA@0: db 0B4H, 0A3H, 00H, 00H __imp__GetLastError@0: db 0C6H, 0A3H, 00H, 00H
__imp__GetModuleHandleA@4: db 0D6H, 0A3H, 00H, 00H

I am potentially interested in some 4k, 16k, 64k executable

competitions so would be interesting to know how to throw away

some unneeded setup code and tables from here

Maybe the __imp__GetCommandLineA stuff can be more easily understood by rewriting it in c:

the "__imp__GetCommandLineA@0:" is a label, which can be viewed as a pointer which points to the instruction or data after it.

So


ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function

imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5:                                               ; byte
        db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....

means


PointerToFunction imp_GetCommandLineA_ptr = 0x0000A3B4;  // db 0B4H, 0A3H, 00H, 00H 
imp_GetCommandLineA = &imp_GetCommandLineA_ptr;
		
void _GetCommandLineA(){
	PointerToFunction ptr = *imp_GetCommandLineA; // [imp_GetCommandLineA]
	(*ptr)(); // jmp     near [imp_GetCommandLineA] 
}

//and in some initialization code:
HMODULE handle = LoadLibrary("kernel32.dll");
*__imp__GetCommandLineA = GetProcAddrress(handle,"GetCommandLineA");
//because the GetCommandLineA is in the seperate dll, its' actual address cannot be known until the dll is loaded.
//we update the address after the dll is loaded.

So the lines you see around __imp__GetCommandLineA is preallocated memory to hold the function address retrieved from the dll.

___DTOR_LIST__ is array of pointers to termination functions which should be called when the program ends. by the look of the hex dump it seems to be empty, by setting optimization to MinimalSize in ProjectProperties->C/C++->Optimization might shrink it.

.eh_frame contains data used by exception handling, exceptions can be turned off in ProjectProperties->C/C++->CodeGeneration->EnableC++Exceptions.

If you are trying to do some 64k, you do not need to know how to throw away setup codes.

What you need to know is how to write setup code with assembly from scratch, maybe by referring to some thing like
http://www.oopweb.com/Assembly/Documents/Win32ASM/Volume/win32asm.htm

Write what you need, not have a general purpose compiler who don't have 64k in their mind do the work for you.

This topic is closed to new replies.

Advertisement