[asm]Memory addressing and segments in exe

Started by
8 comments, last by LessBread 17 years, 8 months ago
Thanks to this site for all learnings of assambler I found. In my old computer (it was really old) there is only two programming enviroment - BASIC and machine language (Hex digit that are threated as computer instruction), so I wanted to remind the assambler. There is two question that are in my mind. I've searched all the documentation installed with MASM and TASM but I can't find the answer. The address bus determines how much RAM the computer can have. If you don't have so much, the computer will emulate a part of the HDD as RAM. In 16-bit processors the address bus is 24-bits wide (excluding 8086) and can address memory from 0 to 16 MB (FFFFFF in hex). In 32-bit processor the address bus is 32-bit wide so you can store the value in memory using the address between 0 and 4GB (FFFFFFFF Hex). Ths strange here is the exectuion itself. At the time executing some instruction, Next instruction pointer (IP) and CS shows the address of the next instruction. Combining them together will get 32-bits address (but the PC has only 24-bits address bus). Of course the segment can be replaced by some memory address and the offset of segment is actually the offset of that memory address. The real problem comes when using 32-bit processor. The next instruction is pointed be CS:EIP. EIP and the address bus are 32-bit long, so you are limited actually to all avaible RAM in one segment. Assume that a program start execution at address zero of current segment (The EIP becomes equal to zero), and finishes at the end of the current segment (The EIP becomes equal to the largest value can have - FFFFFFFF(hex)). That means the program will be 4GB large and will take all available memory the computer have. What happens if the program try to access in other segment or I mistake with that the address of the memory location is equal to address bus width? And second question that I need answer is about linkers and exe files. MASM generates and exe file after it link.exe is executed. Opening the exe file in hex editor I saw some kind of model that apply to all exe files created by the linker of MASM611. The first two bytes are always '4D 5A' (Hex) and at the address 512 is places code segment (I've made 4-5 very small programs by now), at the address 1024 goes data segment and at the address 2048 goes stack segment. This three segment are loaded into three free different segment locations in memory and after that (it must be something into the header of exe) the CS:EIP is set to the segment where code from exe is loaded and EIP is set to the starting address. How can I create exe by my own and set the starting address at proper location? And generally where I can find infromation how to create EXE header and how to delemit the segments of the exe (how to make the segments are loded in proper segment address)? And finally where I can find a documentation for instruction that are not documented in MASM611 help? Sorry for long post, but I need help that I can't find anywhere [help]
______________________________Excited of C++Keen on C++Love C++Bulgarian Pirate :)
Advertisement
Whitespaces is the shit, believe me.
Exitus Acta Probat
Well, ill give it a shot.

First of all:
With the protected mode 32bit processors (386+) the segment registers (cs ds es ss etc) is no longer used in the same way (leftshift 4bits and add). Instead, the lowest two bits is your privilege level (determines if you're in kernel- or userspace) and the rest is an offset into a memory table that isnt visible to your program. That means that you should never change your segment registers. (And you'd probably get a general protection fault if you did).

Furthermore all memory is isolated between applications, this means that even if you use all your memory (or really all of the addres space) you dont actually use all of the computers memory. The OS pages your contiguous address space into physical memory (and as you said HDD).

The environment the OS provides you with are basically a flat addres space (virtual memory) that is (i think) 3GB in windows (the os uses the last GB for itself. there is no need for concern of the seg regs. You basically have all that memory to play with yourself, and the os will take care of fiddling with the lowlevel parts. Plus, cs ds and the like all point to the same memory, so writing to ds:eip would overwrite your code.

Second:
The first two bytes are the magic 'MZ'. All the other stuff you asked about is in the specification for the executable. This can be found on the internet (dont have a link sorry) The executable format used by windows is called "PE executable". If you want more information on Inteltype processors and their memory addressing, you can get "Intel System Programming Guide" from intels website.

And a last word on the loading of the exe. The program isnt really loaded into three different free segments in memory, (as understood in the old 8086 type of segment) but rather the program is loaded into a flat memory space where only the 32bit offset matters (to you).

The entire x86 instruction set can be found at intel's website and is called Instruction Set Reference. It can probably be found under the documentation for the processors.
On the x86, physical addresses are mapped to 64 bits. Here are the chip manuals IA-32 Intel® Architecture Software Developer's Manual, Volumes 1, 2A, 2B, 3A and 3B, downloadable as pdf files. I recommend getting them all. They should also be available in languages other than English.

Under windows, the flat address space is 4 GB, the lower 2 GB are for user mode addresses, the upper 2 GB are for kernel mode addresses. In some configurations this can be tweaked to 3 GB and 1 GB. The address space of each process is separate, however, the kernel address space is mapped into the upper 2 Gb of every process, so by and large this memory is shared. If you really want to get into the low level details of Windows memory management, check out Memory Management: What Every Driver Writer Needs to Know.

The operating system handles loading exe files. The wikipedia entry for Portable Executable files is a good place to begin. The external links at the bottom will fill in the details, especially the articles by Matt Pietrek.
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Nice about the links, it was 2AM here and i was a tired panda.

Just a small nitpick about the 64bit part, (because, you know, i can =))

from manual 3A(Section 3.3):
In protected mode, the IA-32 architecture provides a normal physical address space of 4 GBytes (2^32 bytes). This is the address space that the processor can address on its address bus.
...
Starting with the Pentium Pro processor, the IA-32 architecture also supports an extension of the physical address space to 2^36 bytes (64 GBytes).

Theres also IA-32e, but i dont know anything about that.

Cheers...
Well creating PE exe is not so easy because all the information you give with this links is a shit. I found two or three same tables describing the executable bytes, but no one actually shows the real executable header.
According to the table showed in pecoff.doc from microsoft.com the first two bytes are the magic (Don't know what is it) and it is described:
Quote:The unsigned integer that identifies the state of the image file. The most common number is 0x10B, which identifies it as a normal executable file. 0x107 identifies it as a ROM image, and 0x20B identifies it as a PE32+ executable.

But the magic is actually 0x4D5E. It is followed by the minor and major linker version and then it is an address of don't know what but it is everything, but not the size of Code.
I just remember my old computer. It hasn't HDD, it has 5.25'' FDD but when I want to make low-level program just write it in memory and start it with specified address and 'G' suffix. I think everything in 'modern' computers is protected to "cannot edit where you don't have work".
I got tired of reading theory that actually is very difficult (almost impossible) to try it in practise.
To make this post more meaningful I ask for an example of an working PE header not just something that work only in theory.
______________________________Excited of C++Keen on C++Love C++Bulgarian Pirate :)
There is more than one header. 0x4d5a (assuming you mean that instead of 0x4d5e) is the magic number for the original DOS executable header that is still present in modern formats. The 0x010b stuff is the type indicator of one of the optional headers.
-Mike
Quote:Original post by Prak
Nice about the links, it was 2AM here and i was a tired panda.

Just a small nitpick about the 64bit part, (because, you know, i can =))

from manual 3A(Section 3.3):
In protected mode, the IA-32 architecture provides a normal physical address space of 4 GBytes (2^32 bytes). This is the address space that the processor can address on its address bus.
...
Starting with the Pentium Pro processor, the IA-32 architecture also supports an extension of the physical address space to 2^36 bytes (64 GBytes).

Theres also IA-32e, but i dont know anything about that.

Cheers...


I meant to respond to this before. I don't know why I didn't. Anyway.

As far as a user mode program running on an IA32 is concerned, virtual memory addresses will always be 32 bit.

Regarding my remark about 64 bit mappings, I should have qualified my comment as Windows-centric. The PHYSICAL_ADDRESS type is 64 bits long. Consult the sections titled "Virtual Address Space" and "Physical Address Space" in the Memory Management link I dropped.

Quote:
As hardware has evolved, the number of address bits has increased, leading to larger physical address spaces and potentially greater amounts of RAM. Current x86 CPUs use 32, 36, or 40 bits for physical addresses in the modes that Windows supports, although the chipsets that are attached to some 40-bit processors limit the sizes to fewer bits. Current releases of 32-bit Windows support a maximum of 37 bits of physical address for use as general-purpose RAM (more may be used for I/O space RAM), for a maximum physical address space of 128 GB. (These values may increase in the future.) Windows also continues to support older processors that decode only 32 bits of physical address (and thus can address a maximum of 4 GB).


"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Quote:Original post by Anon Mike
There is more than one header. 0x4d5a (assuming you mean that instead of 0x4d5e) is the magic number for the original DOS executable header that is still present in modern formats. The 0x010b stuff is the type indicator of one of the optional headers.


4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF...

0x4D5A is already known. Following the PE32 description table, the 0x9000 is a major and minor linker version, next 4 bytes are the size of code (word and dword values are descending)... how the code could be lond 3 bytes. Next is the size of initialized data which is 4 bytes long and the miss to say for what are that FF FF. Not every PE is exactly the same but this FF FF values gives an error if the are different. I asking for a documentation describing how exactly must be written in this part of the exe. I need an example to follow, because the low level writing programs in high level programming environment is really shit.
Less theory, more practise :)
______________________________Excited of C++Keen on C++Love C++Bulgarian Pirate :)
typedef struct _IMAGE_DOS_HEADER {	WORD e_magic;	WORD e_cblp;	WORD e_cp;	WORD e_crlc;	WORD e_cparhdr;	WORD e_minalloc;	WORD e_maxalloc;	WORD e_ss;	WORD e_sp;	WORD e_csum;	WORD e_ip;	WORD e_cs;	WORD e_lfarlc;	WORD e_ovno;	WORD e_res[4];	WORD e_oemid;	WORD e_oeminfo;	WORD e_res2[10];	LONG e_lfanew;} IMAGE_DOS_HEADER,*PIMAGE_DOS_HEADER;


Iirc, the only parts that matter are the MZ at the beginning and the e_lfanew at the end.

Descriptions of several binary file types can be found here: Wotsits: Binary formats

Other structures similar to IMAGE_DOS_HEADER:

IMAGE_OS2_HEADER, IMAGE_VXD_HEADER

Other related constants:

#define IMAGE_DOS_SIGNATURE 0x5A4D
#define IMAGE_OS2_SIGNATURE 0x454E
#define IMAGE_OS2_SIGNATURE_LE 0x454C
#define IMAGE_VXD_SIGNATURE 0x454C
#define IMAGE_NT_SIGNATURE 0x4550

These values can be found in the header files of many freely available compilers. MinGCC, Lcc-win32, PellesC,...
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man

This topic is closed to new replies.

Advertisement