Jump to content
  • Advertisement
Sign in to follow this  
fir

disasembly of some function

This topic is 2058 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

when i compile 
 

void ERROR_EXIT(char* text)
{
       MessageBox(NULL, text, "error" ,  MB_OK | MB_ICONEXCLAMATION);
       exit(-1);
}
 
with mingw and disasembly it with (objconv by agner fog, i liked it so im using it as a disasembler sometimes) it give
 

 
 


ALIGN   4
 
__Z10ERROR_EXITPcS_:
        sub     esp, 28                 ; 0040217C _ 83. EC, 1C
        mov     dword [esp+0CH], 48     ; 0040217F _ C7. 44 24, 0C, 00000030
        mov     eax, dword [esp+24H]    ; 00402187 _ 8B. 44 24, 24
        mov     dword [esp+8H], eax     ; 0040218B _ 89. 44 24, 08
        mov     eax, dword [esp+20H]    ; 0040218F _ 8B. 44 24, 20
        mov     dword [esp+4H], eax     ; 00402193 _ 89. 44 24, 04
        mov     dword [esp], 0          ; 00402197 _ C7. 04 24, 00000000
        call    _MessageBoxA@16         ; 0040219E _ E8, 000034DD
        sub     esp, 16                 ; 004021A3 _ 83. EC, 10
        mov     dword [esp], -1         ; 004021A6 _ C7. 04 24, FFFFFFFF
        call    _exit                   ; 004021AD _ E8, 000033B6
 


 
i do not understand this 0040218B values (what it really stands for? some previous examples seem to suggest that it is real process memory adress where this code will be laying in 9after loading to ram) but i am not sure
- also I do not understand what adresses like 000033B6 mean, why they are
different?
 
also what means dot after 89. etc means is mystery for me
 
could someone explain it a bit maybe?
 

also what means @16 at the and of _MessageBoxA@16 ?

Edited by fir

Share this post


Link to post
Share on other sites
Advertisement

I'll try and answer what I know.

 


i do not understand this 0040218B values (what it really stands for? some previous examples seem to suggest that it is real process memory adress where this code will be laying in 9after loading to ram) but i am not sure

On Windows, the address 0x00401000 is a pretty standard address for the main entry point of your program (if no executable packing or other obfuscation has been done). It is however not global to the system, it is relative to the starting point of your program space allocated by the underlying OS, and unsurprisingly starts at 0x00000000.

 


also what means @16 at the and of _MessageBoxA@16

It is a calling convention of Window's STDCALL. The function being called is always name decorated with a leading underscore, followed by an @, and then the number (in bytes) of arguments passed on the stack. This number will always be a multiple of 4 on a 32-bit aligned machine.

 

In your example, @16 means that 4 arguments are passed to _MessageBox on the stack. If you were to disassemble the function _MessageBox, you will find the return statement:

ret 16

which will pop the 4 arguments off the stack again before returning.

 

As to the other questions, I can only make educated guesses.

Share this post


Link to post
Share on other sites

The big numbers starting with 004 are the memory adresses within the code segment. Note that code can be relocated uppon loading.

The smaller numbers like 34DD and 33B6 are offsets. E8 is a call with a relative offset so those numbers get added to the adress of the instruction following the call instruction to get the adress of the target of the call.

 

What the dot after the opcode means? No clue :-/

 

The @16? This is a byproduct of C++ called name mangling. In C++ you can have several functions with the same name, which only differ in the number of parameters (or in the case of methods in the classes they belong to). In order to distinguish those, the compiler adds some magic numbers and characters to the function names, usually based on

the number and type of parameters. The specifics however are different for each compiler.

Share this post


Link to post
Share on other sites


The @16? This is a byproduct of C++ called name mangling. In C++ you can have several functions with the same name, which only differ in the number of parameters (or in the case of methods in the classes they belong to). In order to distinguish those, the compiler adds some magic numbers and characters to the function names, usually based on

the number and type of parameters. The specifics however are different for each compiler.

This is incorrect, see my post above.

Share this post


Link to post
Share on other sites

alright, tnx for the answer

I do not understood though the 0x00401000

 

Is this just the adress in virtuall process sace

where it will be loaded? What with the first 4MB

of adress space skipped

Share this post


Link to post
Share on other sites

The 0x004XXXXX after the semicolon is the virtual address of that instruction. The 4MB skip seems to be some kind of tradition, maybe to guarantee that you will get a segmentation fault when you try to access a struct/class member of a pointer to null.

 

after the "_" is hex dump of the instruction:

 

Bytes before the "." is the main opcode of the instruction. Addtional bytes before "," specifies how the instruction interacts with other data, like read from which register or dereference the memory address in some register with some offset. the rest after ","  is additional argument for the instruction.

 

so for:

mov     dword [esp+0CH], 48     ; 0040217F _ C7. 44 24, 0C, 00000030

0040217F is the address of the instruction, you could use "jmp 0040217F" to make it the next instruction to execute.

"C7." means "mov", note there are multiple forms of "mov" with different opcode to move different stuff around. Though I don't know why the disassembler put the "." there, C7 itself is not complete.

"C7. " combined with "44 24," means "move a const value into the address in esp plus some offset"

"0C" is  the offset

"00000030" is the const value to move in [esp + 0x0C].

 

stuffs after the ";" is comment used as reference of the corresponding machine code, normally you don't need to read it unless you are doing some really low level stuff or trying to make a assembler/disassembler.

 

If you want to know more about how to translate assembly into machine code you could read the manuals at

http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

x86 machine language is really, really complicated, you might want to not go further than assembly.

 

Or, you can have some fun with

http://www.cppgm.org/

in Programming Assignment 9 you are required to write a x86/64 assembler, which is both fun and painful.

Share this post


Link to post
Share on other sites


The 0x004XXXXX after the semicolon is the virtual address of that instruction. The 4MB skip seems to be some kind of tradition, maybe to guarantee that you will get a segmentation fault when you try to access a struct/class member of a pointer to null.

 

after the "_" is hex dump of the instruction:

 

Bytes before the "." is the main opcode of the instruction. Addtional bytes before "," specifies how the instruction interacts with other data, like read from which register or dereference the memory address in some register with some offset. the rest after ","  is additional argument for the instruction.

 

so for:

mov dword [esp+0CH], 48 ; 0040217F _ C7. 44 24, 0C, 00000030

0040217F is the address of the instruction, you could use "jmp 0040217F" to make it the next instruction to execute.

"C7." means "mov", note there are multiple forms of "mov" with different opcode to move different stuff around. Though I don't know why the disassembler put the "." there, C7 itself is not complete.

"C7. " combined with "44 24," means "move a const value into the address in esp plus some offset"

"0C" is  the offset

"00000030" is the const value to move in [esp + 0x0C].

 

stuffs after the ";" is comment used as reference of the corresponding machine code, normally you don't need to read it unless you are doing some really low level stuff or trying to make a assembler/disassembler.

 

 

Alright, very much tnx, everything seem clear now

 

When im looking on disasembly of my program one thing that yet surprised me is that all the code and data seem to be lying in the one contagious chunk  $401000 and above

 

at the end of the code there is yet something like

 
_VirtualProtect@16:; Function begin
.text:  jmp     near [imp_VirtualProtect]               ; 00405708 _ FF. 25, 0040A20C(d)
; _VirtualProtect@16 End of function
 
        nop                                             ; 0040570E _ 90
        nop                                             ; 0040570F _ 90
 
ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function
 
        nop                                             ; 00405716 _ 90
        nop                                             ; 00405717 _ 90
 
_GetStartupInfoA@4:; Function begin
.text:  jmp     near [imp_GetStartupInfoA]              ; 00405718 _ FF. 25, 0040A1EC(d)
; _GetStartupInfoA@4 End of function
 
        nop                                             ; 0040571E _ 90
        nop                                             ; 0040571F _ 90
 
ALIGN   16
_EnterCriticalSection@4:; Function begin
ALIGN   16
.text:  jmp     near [imp_EnterCriticalSection]         ; 00405720 _ FF. 25, 0040A1D4(d)
; _EnterCriticalSection@4 End of function
 
        nop                                             ; 00405726 _ 90
        nop                                             ; 00405727 _ 90
 
 
does maybe someone know whot is this, and wjhat means (d) at the end of 0040A1D4(d) etc?
 
I got also some strange data not belonging to my functions i wrote (im compiling in c++ mode but code is plain c )
like
 

___DTOR_LIST__:                                         ; byte
        db 0FFH, 0FFH, 0FFH, 0FFH, 00H, 00H, 00H, 00H   ; 00405858 _ ........
        db 00H, 00H, 00H, 00H, 00H, 00H, 00H, 00H       ; 00405860 _ ........
     and yet 30 lines or so

or
 
SECTION .eh_frame align=4 noexecute                     ; section number 4, const
 
        db 14H, 00H, 00H, 00H, 00H, 00H, 00H, 00H       ; 00408000 _ ........
 and yet 300 lines or so
 
 
then something like
 
could it be not included here if i would use some switches?
 

imp_ExitProcess:                                        ; import from KERNEL32.dll
__imp__ExitProcess@4:                                   ; byte
.idata$5:                                               ; byte
        db 0A6H, 0A3H, 00H, 00H                         ; 0040A1D8 _ ....
 
imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5:                                               ; byte
        db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....
 
imp_GetLastError:                                       ; import from KERNEL32.dll
__imp__GetLastError@0:                                  ; byte
.idata$5:                                               ; byte
        db 0C6H, 0A3H, 00H, 00H                         ; 0040A1E0 _ ....
 
imp_GetModuleHandleA:                                   ; import from KERNEL32.dll
__imp__GetModuleHandleA@4:                              ; byte
.idata$5:                                               ; byte
        db 0D6H, 0A3H, 00H, 00H                         ; 0040A1E4 _ ....
 
it all ends at 0040C1F8
 
yet question - it is really all code and data packed into such one contigiuus block of memory, and yet cann it be setted by some
compile time switches to be loaded into some other adress of memry than $401000-40c200 ?
 

 

Share this post


Link to post
Share on other sites

Microsoft linkers are defaulted to load code into ImageBase(default 0x4000000) + CodeBase(default 0x1000), and data immediately after it. So it is not one continuous block,but two sections ".text"(for code) and ".idata"(for static data). You can change CodeBase via ProjectSettings->Linker->Advance->BaseAddress, but Microsoft seems to think there are no reasons to meddle with the ImageBase so there are no options. If you wish to change that you have to generate the executable yourself and edit the PE header.

 

But why change it? Because you can? This is something I really don't understand.

 

While compiling c or c++ machine code for codes you write is not the only thing you get. Compiler inserts all sort of things for you like the standard library or os library and setup codes. In a console application generated by MSVC, instead of calling your main() directly, it calls something like _tmainCRTSetup() first to initialize static variable, call os functions like _GetCommandLineA as you see in your disassembly to put command line arguments in argc and argv, then call your main().

 

So the function _GetCommandLineA must be inserted into your exe:

ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function

The actual GetCommandLineA is implemented in Kernel32.dll, so the address of that function is written in the address [imp_GetCommandLineA]

imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5:                                               ; byte
        db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....

"db 0B4H, 0A3H, 00H, 00H" is the address of that function. Since it is a function in a dll, it is probably some stub data and will be rewrite to the actual value once kernel32.dll is loaded.

 

When you call GetCommandLineA,the address "db 0B4H, 0A3H, 00H, 00H" is retrived by dereferencing the address imp_GetCommandLineA(0040A1DC(d)), and jumped to. as you can find in page 13 of http://www.agner.org/optimize/objconv-instructions.pdf , (d) means a direct address, sometimes you will get (rel) which means a offset from the current instruction address.

 

___DTOR_LIST__ seems to be something related with class destructors, c++ mode treats everything as c++, it doesn't care if your code is plain c. These are some MSVC implementation details I think you don't need to pay attention to. Normally when you disassemble c++ you only care about some certain functions, like write some stuff in main() and see how main() goes. The whole exe have a bunch of stuffs that are not worth your time.

 

Perhaps you should learn by looking at some win32 assembly tutorials and not by disassembling a c++ program. c++ is a real mess for starters.

Share this post


Link to post
Share on other sites

Microsoft linkers are defaulted to load code into ImageBase(default 0x4000000) + CodeBase(default 0x1000), and data immediately after it. So it is not one continuous block,but two sections ".text"(for code) and ".idata"(for static data). You can change CodeBase via ProjectSettings->Linker->Advance->BaseAddress, but Microsoft seems to think there are no reasons to meddle with the ImageBase so there are no options. If you wish to change that you have to generate the executable yourself and edit the PE header.

 

But why change it? Because you can? This is something I really don't understand.

 

While compiling c or c++ machine code for codes you write is not the only thing you get. Compiler inserts all sort of things for you like the standard library or os library and setup codes. In a console application generated by MSVC, instead of calling your main() directly, it calls something like _tmainCRTSetup() first to initialize static variable, call os functions like _GetCommandLineA as you see in your disassembly to put command line arguments in argc and argv, then call your main().

 

So the function _GetCommandLineA must be inserted into your exe:

ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function

The actual GetCommandLineA is implemented in Kernel32.dll, so the address of that function is written in the address [imp_GetCommandLineA]

imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5:                                               ; byte
        db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....

"db 0B4H, 0A3H, 00H, 00H" is the address of that function. Since it is a function in a dll, it is probably some stub data and will be rewrite to the actual value once kernel32.dll is loaded.

 

When you call GetCommandLineA,the address "db 0B4H, 0A3H, 00H, 00H" is retrived by dereferencing the address imp_GetCommandLineA(0040A1DC(d)), and jumped to. as you can find in page 13 of http://www.agner.org/optimize/objconv-instructions.pdf , (d) means a direct address, sometimes you will get (rel) which means a offset from the current instruction address.

 

___DTOR_LIST__ seems to be something related with class destructors, c++ mode treats everything as c++, it doesn't care if your code is plain c. These are some MSVC implementation details I think you don't need to pay attention to. Normally when you disassemble c++ you only care about some certain functions, like write some stuff in main() and see how main() goes. The whole exe have a bunch of stuffs that are not worth your time.

 

Perhaps you should learn by looking at some win32 assembly tutorials and not by disassembling a c++ program. c++ is a real mess for starters.

 

This is worth the time for sure - it just need some explanation 

(maybe a bit to hard to find, and then i will be seeing clearly all the disasembled binary of the whole exe - this is much worth the time) maybe even im not so much starter, I feel like I understand now 75% of it (disasembled file doubts) but still get some knowledge holes as

with this 

___DTOR_LIST__:    

or 

SECTION .eh_frame

 or

 

 
__imp__ExitProcess@4:            db 0A6H, 0A3H, 00H, 00H   
__imp__GetCommandLineA@0:      db 0B4H, 0A3H, 00H, 00H     __imp__GetLastError@0:           db 0C6H, 0A3H, 00H, 00H   
__imp__GetModuleHandleA@4:         db 0D6H, 0A3H, 00H, 00H   

I am potentially interested in some 4k, 16k, 64k executable

competitions so would be interesting to know how to throw away

some unneeded setup code and tables from here

Edited by fir

Share this post


Link to post
Share on other sites

Maybe the __imp__GetCommandLineA stuff can be more easily understood by rewriting it in c:

the "__imp__GetCommandLineA@0:" is a label, which can be viewed as a pointer which points to the instruction or data after it.

 

So

ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function

imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5:                                               ; byte
        db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....

means

PointerToFunction imp_GetCommandLineA_ptr = 0x0000A3B4;  // db 0B4H, 0A3H, 00H, 00H 
imp_GetCommandLineA = &imp_GetCommandLineA_ptr;
		
void _GetCommandLineA(){
	PointerToFunction ptr = *imp_GetCommandLineA; // [imp_GetCommandLineA]
	(*ptr)(); // jmp     near [imp_GetCommandLineA] 
}

//and in some initialization code:
HMODULE handle = LoadLibrary("kernel32.dll");
*__imp__GetCommandLineA = GetProcAddrress(handle,"GetCommandLineA");
//because the GetCommandLineA is in the seperate dll, its' actual address cannot be known until the dll is loaded.
//we update the address after the dll is loaded.

So the lines you see around __imp__GetCommandLineA is preallocated memory to hold the function address retrieved from the dll.

 

___DTOR_LIST__ is array of pointers to termination functions which should be called when the program ends. by the look of the hex dump it seems to be empty, by setting optimization to MinimalSize in ProjectProperties->C/C++->Optimization might shrink it.

 

.eh_frame contains data used by exception handling, exceptions can be turned off in ProjectProperties->C/C++->CodeGeneration->EnableC++Exceptions.

 

If you are trying to do some 64k, you do not need to know how to throw away setup codes.

What you need to know is how to write setup code with assembly from scratch, maybe by referring to some thing like
http://www.oopweb.com/Assembly/Documents/Win32ASM/Volume/win32asm.htm

Write what you need, not have a general purpose compiler who don't have 64k in their mind do the work for you.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!