disasembly of some function

This topic is 1784 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

when i compile


void ERROR_EXIT(char* text)
{
MessageBox(NULL, text, "error" ,  MB_OK | MB_ICONEXCLAMATION);
exit(-1);
}


with mingw and disasembly it with (objconv by agner fog, i liked it so im using it as a disasembler sometimes) it give



ALIGN   4

__Z10ERROR_EXITPcS_:
sub     esp, 28                 ; 0040217C _ 83. EC, 1C
mov     dword [esp+0CH], 48     ; 0040217F _ C7. 44 24, 0C, 00000030
mov     eax, dword [esp+24H]    ; 00402187 _ 8B. 44 24, 24
mov     dword [esp+8H], eax     ; 0040218B _ 89. 44 24, 08
mov     eax, dword [esp+20H]    ; 0040218F _ 8B. 44 24, 20
mov     dword [esp+4H], eax     ; 00402193 _ 89. 44 24, 04
mov     dword [esp], 0          ; 00402197 _ C7. 04 24, 00000000
call    _MessageBoxA@16         ; 0040219E _ E8, 000034DD
sub     esp, 16                 ; 004021A3 _ 83. EC, 10
mov     dword [esp], -1         ; 004021A6 _ C7. 04 24, FFFFFFFF
call    _exit                   ; 004021AD _ E8, 000033B6



i do not understand this 0040218B values (what it really stands for? some previous examples seem to suggest that it is real process memory adress where this code will be laying in 9after loading to ram) but i am not sure
- also I do not understand what adresses like 000033B6 mean, why they are
different?

also what means dot after 89. etc means is mystery for me

could someone explain it a bit maybe?

also what means @16 at the and of _MessageBoxA@16 ?

Edited by fir

Share on other sites

I'll try and answer what I know.

i do not understand this 0040218B values (what it really stands for? some previous examples seem to suggest that it is real process memory adress where this code will be laying in 9after loading to ram) but i am not sure

On Windows, the address 0x00401000 is a pretty standard address for the main entry point of your program (if no executable packing or other obfuscation has been done). It is however not global to the system, it is relative to the starting point of your program space allocated by the underlying OS, and unsurprisingly starts at 0x00000000.

also what means @16 at the and of _MessageBoxA@16

It is a calling convention of Window's STDCALL. The function being called is always name decorated with a leading underscore, followed by an @, and then the number (in bytes) of arguments passed on the stack. This number will always be a multiple of 4 on a 32-bit aligned machine.

In your example, @16 means that 4 arguments are passed to _MessageBox on the stack. If you were to disassemble the function _MessageBox, you will find the return statement:

ret 16

which will pop the 4 arguments off the stack again before returning.

As to the other questions, I can only make educated guesses.

Share on other sites

The big numbers starting with 004 are the memory adresses within the code segment. Note that code can be relocated uppon loading.

The smaller numbers like 34DD and 33B6 are offsets. E8 is a call with a relative offset so those numbers get added to the adress of the instruction following the call instruction to get the adress of the target of the call.

What the dot after the opcode means? No clue :-/

The @16? This is a byproduct of C++ called name mangling. In C++ you can have several functions with the same name, which only differ in the number of parameters (or in the case of methods in the classes they belong to). In order to distinguish those, the compiler adds some magic numbers and characters to the function names, usually based on

the number and type of parameters. The specifics however are different for each compiler.

Share on other sites

The @16? This is a byproduct of C++ called name mangling. In C++ you can have several functions with the same name, which only differ in the number of parameters (or in the case of methods in the classes they belong to). In order to distinguish those, the compiler adds some magic numbers and characters to the function names, usually based on

the number and type of parameters. The specifics however are different for each compiler.

This is incorrect, see my post above.

Share on other sites

I do not understood though the 0x00401000

Is this just the adress in virtuall process sace

where it will be loaded? What with the first 4MB

Share on other sites

The 0x004XXXXX after the semicolon is the virtual address of that instruction. The 4MB skip seems to be some kind of tradition, maybe to guarantee that you will get a segmentation fault when you try to access a struct/class member of a pointer to null.

after the "_" is hex dump of the instruction:

Bytes before the "." is the main opcode of the instruction. Addtional bytes before "," specifies how the instruction interacts with other data, like read from which register or dereference the memory address in some register with some offset. the rest after ","  is additional argument for the instruction.

so for:

mov     dword [esp+0CH], 48     ; 0040217F _ C7. 44 24, 0C, 00000030


0040217F is the address of the instruction, you could use "jmp 0040217F" to make it the next instruction to execute.

"C7." means "mov", note there are multiple forms of "mov" with different opcode to move different stuff around. Though I don't know why the disassembler put the "." there, C7 itself is not complete.

"C7. " combined with "44 24," means "move a const value into the address in esp plus some offset"

"0C" is  the offset

"00000030" is the const value to move in [esp + 0x0C].

stuffs after the ";" is comment used as reference of the corresponding machine code, normally you don't need to read it unless you are doing some really low level stuff or trying to make a assembler/disassembler.

If you want to know more about how to translate assembly into machine code you could read the manuals at

http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

x86 machine language is really, really complicated, you might want to not go further than assembly.

Or, you can have some fun with

http://www.cppgm.org/

in Programming Assignment 9 you are required to write a x86/64 assembler, which is both fun and painful.

Share on other sites

The 0x004XXXXX after the semicolon is the virtual address of that instruction. The 4MB skip seems to be some kind of tradition, maybe to guarantee that you will get a segmentation fault when you try to access a struct/class member of a pointer to null.

after the "_" is hex dump of the instruction:

Bytes before the "." is the main opcode of the instruction. Addtional bytes before "," specifies how the instruction interacts with other data, like read from which register or dereference the memory address in some register with some offset. the rest after ","  is additional argument for the instruction.

so for:

mov dword [esp+0CH], 48 ; 0040217F _ C7. 44 24, 0C, 00000030

0040217F is the address of the instruction, you could use "jmp 0040217F" to make it the next instruction to execute.

"C7." means "mov", note there are multiple forms of "mov" with different opcode to move different stuff around. Though I don't know why the disassembler put the "." there, C7 itself is not complete.

"C7. " combined with "44 24," means "move a const value into the address in esp plus some offset"

"0C" is  the offset

"00000030" is the const value to move in [esp + 0x0C].

stuffs after the ";" is comment used as reference of the corresponding machine code, normally you don't need to read it unless you are doing some really low level stuff or trying to make a assembler/disassembler.

Alright, very much tnx, everything seem clear now

When im looking on disasembly of my program one thing that yet surprised me is that all the code and data seem to be lying in the one contagious chunk  401000 and above at the end of the code there is yet something like  _VirtualProtect@16:; Function begin .text: jmp near [imp_VirtualProtect] ; 00405708 _ FF. 25, 0040A20C(d) ; _VirtualProtect@16 End of function nop ; 0040570E _ 90 nop ; 0040570F _ 90 ALIGN 16 _GetCommandLineA@0:; Function begin ALIGN 16 .text: jmp near [imp_GetCommandLineA] ; 00405710 _ FF. 25, 0040A1DC(d) ; _GetCommandLineA@0 End of function nop ; 00405716 _ 90 nop ; 00405717 _ 90 _GetStartupInfoA@4:; Function begin .text: jmp near [imp_GetStartupInfoA] ; 00405718 _ FF. 25, 0040A1EC(d) ; _GetStartupInfoA@4 End of function nop ; 0040571E _ 90 nop ; 0040571F _ 90 ALIGN 16 _EnterCriticalSection@4:; Function begin ALIGN 16 .text: jmp near [imp_EnterCriticalSection] ; 00405720 _ FF. 25, 0040A1D4(d) ; _EnterCriticalSection@4 End of function nop ; 00405726 _ 90 nop ; 00405727 _ 90  does maybe someone know whot is this, and wjhat means (d) at the end of 0040A1D4(d) etc? I got also some strange data not belonging to my functions i wrote (im compiling in c++ mode but code is plain c ) like  ___DTOR_LIST__: ; byte db 0FFH, 0FFH, 0FFH, 0FFH, 00H, 00H, 00H, 00H ; 00405858 _ ........ db 00H, 00H, 00H, 00H, 00H, 00H, 00H, 00H ; 00405860 _ ........ and yet 30 lines or so or SECTION .eh_frame align=4 noexecute ; section number 4, const db 14H, 00H, 00H, 00H, 00H, 00H, 00H, 00H ; 00408000 _ ........ and yet 300 lines or so  then something like could it be not included here if i would use some switches?  imp_ExitProcess: ; import from KERNEL32.dll __imp__ExitProcess@4: ; byte .idata5:                                               ; byte
db 0A6H, 0A3H, 00H, 00H                         ; 0040A1D8 _ ....

imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5: ; byte db 0B4H, 0A3H, 00H, 00H ; 0040A1DC _ .... imp_GetLastError: ; import from KERNEL32.dll __imp__GetLastError@0: ; byte .idata$5:                                               ; byte
db 0C6H, 0A3H, 00H, 00H                         ; 0040A1E0 _ ....

imp_GetModuleHandleA:                                   ; import from KERNEL32.dll
__imp__GetModuleHandleA@4:                              ; byte
.idata$5: ; byte db 0D6H, 0A3H, 00H, 00H ; 0040A1E4 _ ....  it all ends at 0040C1F8 yet question - it is really all code and data packed into such one contigiuus block of memory, and yet cann it be setted by some compile time switches to be loaded into some other adress of memry than$401000-40c200 ?

Share on other sites

Microsoft linkers are defaulted to load code into ImageBase(default 0x4000000) + CodeBase(default 0x1000), and data immediately after it. So it is not one continuous block,but two sections ".text"(for code) and ".idata"(for static data). You can change CodeBase via ProjectSettings->Linker->Advance->BaseAddress, but Microsoft seems to think there are no reasons to meddle with the ImageBase so there are no options. If you wish to change that you have to generate the executable yourself and edit the PE header.

But why change it? Because you can? This is something I really don't understand.

While compiling c or c++ machine code for codes you write is not the only thing you get. Compiler inserts all sort of things for you like the standard library or os library and setup codes. In a console application generated by MSVC, instead of calling your main() directly, it calls something like _tmainCRTSetup() first to initialize static variable, call os functions like _GetCommandLineA as you see in your disassembly to put command line arguments in argc and argv, then call your main().

So the function _GetCommandLineA must be inserted into your exe:

ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function


The actual GetCommandLineA is implemented in Kernel32.dll, so the address of that function is written in the address [imp_GetCommandLineA]

imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata$5: ; byte db 0B4H, 0A3H, 00H, 00H ; 0040A1DC _ ....  "db 0B4H, 0A3H, 00H, 00H" is the address of that function. Since it is a function in a dll, it is probably some stub data and will be rewrite to the actual value once kernel32.dll is loaded. When you call GetCommandLineA,the address "db 0B4H, 0A3H, 00H, 00H" is retrived by dereferencing the address imp_GetCommandLineA(0040A1DC(d)), and jumped to. as you can find in page 13 of http://www.agner.org/optimize/objconv-instructions.pdf , (d) means a direct address, sometimes you will get (rel) which means a offset from the current instruction address. ___DTOR_LIST__ seems to be something related with class destructors, c++ mode treats everything as c++, it doesn't care if your code is plain c. These are some MSVC implementation details I think you don't need to pay attention to. Normally when you disassemble c++ you only care about some certain functions, like write some stuff in main() and see how main() goes. The whole exe have a bunch of stuffs that are not worth your time. Perhaps you should learn by looking at some win32 assembly tutorials and not by disassembling a c++ program. c++ is a real mess for starters. Share this post Link to post Share on other sites Microsoft linkers are defaulted to load code into ImageBase(default 0x4000000) + CodeBase(default 0x1000), and data immediately after it. So it is not one continuous block,but two sections ".text"(for code) and ".idata"(for static data). You can change CodeBase via ProjectSettings->Linker->Advance->BaseAddress, but Microsoft seems to think there are no reasons to meddle with the ImageBase so there are no options. If you wish to change that you have to generate the executable yourself and edit the PE header. But why change it? Because you can? This is something I really don't understand. While compiling c or c++ machine code for codes you write is not the only thing you get. Compiler inserts all sort of things for you like the standard library or os library and setup codes. In a console application generated by MSVC, instead of calling your main() directly, it calls something like _tmainCRTSetup() first to initialize static variable, call os functions like _GetCommandLineA as you see in your disassembly to put command line arguments in argc and argv, then call your main(). So the function _GetCommandLineA must be inserted into your exe: ALIGN 16 _GetCommandLineA@0:; Function begin ALIGN 16 .text: jmp near [imp_GetCommandLineA] ; 00405710 _ FF. 25, 0040A1DC(d) ; _GetCommandLineA@0 End of function  The actual GetCommandLineA is implemented in Kernel32.dll, so the address of that function is written in the address [imp_GetCommandLineA] imp_GetCommandLineA: ; import from KERNEL32.dll __imp__GetCommandLineA@0: ; byte .idata$5:                                               ; byte
db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....


"db 0B4H, 0A3H, 00H, 00H" is the address of that function. Since it is a function in a dll, it is probably some stub data and will be rewrite to the actual value once kernel32.dll is loaded.

When you call GetCommandLineA,the address "db 0B4H, 0A3H, 00H, 00H" is retrived by dereferencing the address imp_GetCommandLineA(0040A1DC(d)), and jumped to. as you can find in page 13 of http://www.agner.org/optimize/objconv-instructions.pdf , (d) means a direct address, sometimes you will get (rel) which means a offset from the current instruction address.

___DTOR_LIST__ seems to be something related with class destructors, c++ mode treats everything as c++, it doesn't care if your code is plain c. These are some MSVC implementation details I think you don't need to pay attention to. Normally when you disassemble c++ you only care about some certain functions, like write some stuff in main() and see how main() goes. The whole exe have a bunch of stuffs that are not worth your time.

Perhaps you should learn by looking at some win32 assembly tutorials and not by disassembling a c++ program. c++ is a real mess for starters.

This is worth the time for sure - it just need some explanation

(maybe a bit to hard to find, and then i will be seeing clearly all the disasembled binary of the whole exe - this is much worth the time) maybe even im not so much starter, I feel like I understand now 75% of it (disasembled file doubts) but still get some knowledge holes as

with this

___DTOR_LIST__:

or

SECTION .eh_frame

or

__imp__ExitProcess@4:            db 0A6H, 0A3H, 00H, 00H
__imp__GetCommandLineA@0:      db 0B4H, 0A3H, 00H, 00H     __imp__GetLastError@0:           db 0C6H, 0A3H, 00H, 00H
__imp__GetModuleHandleA@4:         db 0D6H, 0A3H, 00H, 00H

I am potentially interested in some 4k, 16k, 64k executable

competitions so would be interesting to know how to throw away

some unneeded setup code and tables from here

Edited by fir

Share on other sites

Maybe the __imp__GetCommandLineA stuff can be more easily understood by rewriting it in c:

the "__imp__GetCommandLineA@0:" is a label, which can be viewed as a pointer which points to the instruction or data after it.

So

ALIGN   16
_GetCommandLineA@0:; Function begin
ALIGN   16
.text:  jmp     near [imp_GetCommandLineA]              ; 00405710 _ FF. 25, 0040A1DC(d)
; _GetCommandLineA@0 End of function

imp_GetCommandLineA:                                    ; import from KERNEL32.dll
__imp__GetCommandLineA@0:                               ; byte
.idata\$5:                                               ; byte
db 0B4H, 0A3H, 00H, 00H                         ; 0040A1DC _ ....


means

PointerToFunction imp_GetCommandLineA_ptr = 0x0000A3B4;  // db 0B4H, 0A3H, 00H, 00H
imp_GetCommandLineA = &imp_GetCommandLineA_ptr;

void _GetCommandLineA(){
PointerToFunction ptr = *imp_GetCommandLineA; // [imp_GetCommandLineA]
(*ptr)(); // jmp     near [imp_GetCommandLineA]
}

//and in some initialization code:
//because the GetCommandLineA is in the seperate dll, its' actual address cannot be known until the dll is loaded.


So the lines you see around __imp__GetCommandLineA is preallocated memory to hold the function address retrieved from the dll.

___DTOR_LIST__ is array of pointers to termination functions which should be called when the program ends. by the look of the hex dump it seems to be empty, by setting optimization to MinimalSize in ProjectProperties->C/C++->Optimization might shrink it.

.eh_frame contains data used by exception handling, exceptions can be turned off in ProjectProperties->C/C++->CodeGeneration->EnableC++Exceptions.

If you are trying to do some 64k, you do not need to know how to throw away setup codes.

What you need to know is how to write setup code with assembly from scratch, maybe by referring to some thing like
http://www.oopweb.com/Assembly/Documents/Win32ASM/Volume/win32asm.htm

Write what you need, not have a general purpose compiler who don't have 64k in their mind do the work for you.

• What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

• 13
• 9
• 15
• 14
• 46
• Forum Statistics

• Total Topics
634062
• Total Posts
3015304
×

Important Information

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!