Why the memory of code section in PE file isn't the same with it in process memory scapce!

Started by
9 comments, last by laiyierjiangsu 6 years, 9 months ago

Hi guys !

       I want check if the code section in file(test.dll)  is the same with which I loaded into process memory.  Here is the code.

Firstly I get the crc of code section in PE file


	hFile = CreateFile(
				szFileName,
				GENERIC_READ | GENERIC_WRITE, 
				FILE_SHARE_READ, 
				NULL,
				OPEN_EXISTING,
				FILE_ATTRIBUTE_NORMAL,
				NULL);


			if( hFile != INVALID_HANDLE_VALUE )
			{
				FileSize=GetFileSize(hFile,&szTemp);  
				if (FileSize == 0xFFFFFFFF) return FALSE;
				pBuffer = new TCHAR [FileSize];    
				if(ReadFile(hFile, pBuffer, FileSize, &szTemp, NULL)==NULL) return FALSE;

			}
			else
			{
				printf("I can't access file!");
				return false;
			}


		pDosHeader=(PIMAGE_DOS_HEADER)pBuffer;	 	 
		pNtHeader=(PIMAGE_NT_HEADERS32)((DWORD)pDosHeader+pDosHeader->e_lfanew);
		IMAGE_FILE_HEADER *pFileHeader = &pNtHeader->FileHeader;
		pSecHeader=IMAGE_FIRST_SECTION(pNtHeader);   
		for(int i = 0 ; i < pFileHeader->NumberOfSections;i++)
		{
			if((strcmp((char*)pSecHeader->Name,".text") == 0))
			{
				break;
			}
			pSecHeader++;
		}
		BYTE* pBuffStart = (BYTE*)(pBuffer+pSecHeader->PointerToRawData);
		szCRC32=Crc32_ComputeBuf(pBuffStart,pSecHeader->Misc.VirtualSize);

Secondly, I get the crc in memory scapce


PIMAGE_DOS_HEADER pDosHeader = nullptr;
	PIMAGE_NT_HEADERS pNTHeader = nullptr;
	PIMAGE_SECTION_HEADER pSectionHeader = nullptr;
	DWORD ImageBase, OriginalCRC32;
	ImageBase = (DWORD)GetModuleHandleA(pModuleName);
	pDosHeader = (PIMAGE_DOS_HEADER)ImageBase;
	pNTHeader = (PIMAGE_NT_HEADERS32)((DWORD)pDosHeader + pDosHeader->e_lfanew);
	OriginalCRC32 = *((DWORD*)((DWORD)pNTHeader - 4));
	printf("Original Crc read from file: %08x\n",OriginalCRC32);
	pSectionHeader = IMAGE_FIRST_SECTION(pNTHeader);
    IMAGE_FILE_HEADER* pFileHeader =	&pNTHeader->FileHeader;
	for (int i = 0; i < pFileHeader->NumberOfSections;i++)
	{
		if((strcmp((char*)pSectionHeader->Name,".text") == 0))
		{
			break;
		}
		pSectionHeader++;
	}

	printf("ImageCodeSectionCrc32 first section name:%s\n",pSectionHeader->Name);
	BYTE* pBuffStart = (BYTE*)(ImageBase + pSectionHeader->VirtualAddress); 
	crc32 = Crc32_ComputeBuf(pBuffStart,pSectionHeader->Misc.VirtualSize);

 

The two crc values are not match,  I also can't find the original code buff of pe file in the code section of dll image.

Does I make a mistake here? I sincerely hope someone can help me find out why.

Stay hungry, stay foolish!

Advertisement

Even though modern tools make it difficult to see, each module (exe file and dll file) lives in its own memory space and operates alone.

There are many historical reasons for this, which we can get in to if you would like, but the short answer is that that's why they're different.  There is a boundary between DLLs and your main executable, and between other DLLs.

Keep in mind that the raw bytes on disk take a very complex journey between the disk and memory.

You can learn more about this process by researching the Windows Portable Executable (PE) module loader, but the high-level version is that many transformations can occur while reading a .DLL file and turning it into running executable memory.

For example, files can be stored in a more compact layout than they are represented in memory. The PE file alignment controls the layout on disk, whereas the loaded module is typically laid out in 4KB chunks corresponding to OS-level memory pages. There is a full procedure for converting the disk blocks into memory pages; keywords to look up include Relative Virtual Address or RVA.

Another change that can occur is relocation, which mainly happens to DLLs but can also happen to EXEs (see Address Space Layout Randomization or ASLR). Relocation most often happens when two modules would overlap in memory if loaded at their default addresses. Relocation allows many modules to be loaded into the same process memory space without stomping on each other. Some modules are position-independent and their code bytes will not change when relocated; others are position-dependent and a set of fixups will be applied to their code bytes during loading.

Other parts of your EXE/DLL will probably change during loading as well, such as the Import Address Table and corresponding Export Address Table; these are often arbitrary or zero values on disk, but turn into executable code when the loader processes them.

Overall loading is a sophisticated and detailed process and simply assuming that code will remain intact between disk and RAM is likely to be disappointing at best.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

46 minutes ago, frob said:

Even though modern tools make it difficult to see, each module (exe file and dll file) lives in its own memory space and operates alone.

There are many historical reasons for this, which we can get in to if you would like, but the short answer is that that's why they're different.  There is a boundary between DLLs and your main executable, and between other DLLs.

I'm not sure this is strictly true. For example, when loading a DLL into an EXE's running process, you can pass pointers freely between the two, provided they use the same code (or identical code) to allocate/free the memory involved. If you don't allow allocation or deallocation, you can pass quite sophisticated stuff between DLLs and EXEs, cf. the COM implementation.

If DLLs and EXEs did not share a process memory space, this kind of interoperation would be far more difficult if not impossible.

Another key observation is that you can call DLL code in the first place. If you attach a debugger and walk through the disassembly, you can see that nothing magic happens when you invoke a DLL-exported function. Typically you'll see the address of the function loaded via the IAT (see above) and a simple jmp or call instruction will change the Instruction Pointer (RIP on x64, for instance) to start executing in the DLL's loaded code.

If there were a boundary between DLL and EXE code, you'd expect to see a syscall or some other kernel invocation to traverse it. Since that isn't present (and this is trivially verifiable with any Windows debugger), it seems safe to conclude that the boundary is purely conceptual.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

As I mentioned, it gets a little complicated for historical reasons. Windows has been around for over 30 years and this type of thing has evolved over the decades.

Stack and heap run differently between the two modules, but many other aspects of the memory space are shared.  Memory ranges are the same, and the work under the hood to cross those boundaries differs slightly depending on the OS version.   What it did years ago is different from the way it works today, and the way it works in the future may be different from today.

DLL's are hosted inside the executable's address space, but the modules remain different. Significantly older versions of the OS (decades ago) used shared code for the modules with different memory for data. Memory was tight and at the time it was more important to reduce the memory impact of multiple programs each with their own copy of common controls and runtime libraries. Far better in that era to have a single copy of the dlls loaded each with their memory unique to each hosting process.  

Over the years there have been various changes and additions to the tools. For example, the .lib files you link against provide a simple stub that automatically handles loading and binding the function pointers from the DLL into a function you are familiar with, rather than dealing with loading the libraries, querying for the exported function's pointer addresses, and properly calling functions that cross the boundaries.  The tools of today automatically do that work, but the work is still being done.

Passing any objects across DLL boundaries other than basic memory blocks and raw pointers to basic building blocks is a potentially dangerous thing, as there are many tiny variations that can exist between the executable and the dll, so this provides a boundary unless you can control that everything was complied with exactly the same options to be compatible. Since many programs build them all from source using the same options they tend to not have problems with it, but "DLL Hell" and assorted build issues, including subtle incompatibilities between builds can cause problems. This boundary is part of the reason one of the first solutions for odd memory errors is to do a full rebuild. This ensures all the libraries are back in sync.

Each module (dll or exe) kinda-sorta has it's own memory manager for the heap although each can be identical, each dll gets its own stack which is different but addressable, and (by default) shares the same process heap automatically if you're using the same standard libraries with identical memory management options. 

Again I'm not sure all of that is factually accurate. For instance DLLs do not seem to get their own stack space. Threads get a new stack, as do fibers, but modules do not.

Again: trivially easy to verify with a debugger of your choice. Note the value of the stack pointer register rsp in some EXE code. Invoke a DLL function and observe rsp inside it. Modulo a stack frame, it should be obvious that the same stack is shared across the DLL boundary.

The reasons for telling people not to hand complex objects across DLL borders are mostly rooted in CRT debacles. At the OS module loader level, there's actually not much to worry about.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

There should be a stack for each execution environment, so threads, fibers and interrupt handlers get their own. There is also separate stacks when switching to kernel simply for security reasons. Otherwise, I would only switch stack if I needed a really big one for something, and that hasn't happened yet. We just make regular function calls into DLLs, so it's all executing inside the same execution environment. I would assume DLLs also used the standard libraries for memory management, if the code in those DLLs make calls to new/delete or malloc/free then that's what's going to happen.

 

4 hours ago, ApochPiQ said:

Again I'm not sure all of that is factually accurate. For instance DLLs do not seem to get their own stack space. Threads get a new stack, as do fibers, but modules do not.

Again: trivially easy to verify with a debugger of your choice. Note the value of the stack pointer register rsp in some EXE code. Invoke a DLL function and observe rsp inside it. Modulo a stack frame, it should be obvious that the same stack is shared across the DLL boundary.

The reasons for telling people not to hand complex objects across DLL borders are mostly rooted in CRT debacles. At the OS module loader level, there's actually not much to worry about.

hi, ApochPiQ !

 If I use the  code above to check exe files , it works correctly.

Is there some difference between the .text section of dll and pe files, I am really confused.

I have uploaded the test code. I 'll take a deep learning to find why this happened.

codesectionCheck.7z

Stay hungry, stay foolish!

I'm really pretty sure that this is relocation in action. Without knowing the exact differences you're seeing that is speculative, however.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

16 hours ago, ApochPiQ said:

I'm really pretty sure that this is relocation in action. Without knowing the exact differences you're seeing that is speculative, however.

Thank you ,  It's caused by relocation , I have some wrong knowledge of PE. Really appreciate you.

Stay hungry, stay foolish!

This topic is closed to new replies.

Advertisement