Sign in to follow this  
RMarin

Comparing DLL's on file with in-memory

Recommended Posts

I'm trying to compare certain DLL's in memory with their physical bytes on file. What I'm doing is this..
HMODULE hMemMod = (HMODULE)g_moduleList.at(z).module;
if (IsConcernedModule(hMemMod))
{
	char pFileName[260];
	if (GetModuleFileNameExA(GetCurrentProcess(),hMemMod,pFileName,_MAX_PATH) > 0)
	{
		FILE* fpPhysicalModule;
		char szModuleName [ MAX_PATH ]; ( * szModuleName ) = 0;
		long lSize;
		char* pbPhysModule;
		size_t numBytes;

		fpPhysicalModule = fopen(pFileName,"rb");
		if (!fpPhysicalModule)
		{
			return;
		}

		fseek(fpPhysicalModule, 0, SEEK_END);
		lSize = ftell(fpPhysicalModule);
		rewind(fpPhysicalModule);

		pbPhysModule = (char*)malloc(sizeof(char)*lSize);
		numBytes = fread(pbPhysModule, 1, lSize,fpPhysicalModule);
		fclose(fpPhysicalModule);

		GetModuleBaseNameA(GetCurrentProcess(),(HMODULE)g_moduleList.at(z).module,szModuleName,sizeof(szModuleName));
		DWORD dwMemModule = (DWORD)GetModuleHandleA(szModuleName);

		BYTE* moduleAddress = (BYTE*)dwMemModule;
		BYTE* physicalAddress = (BYTE*)pbPhysModule;

		for (int i = 0;i < lSize;i++)
		{
			if (moduleAddress[i] != physicalAddress[i])
			{
				// this gets called all the time, even if i'm not manipulating the module in memory!
				g_intInsecurity.push_back(i);
			}
		}
				
	}
}
the problem.. Even if I don't manipulate anything in memory, the address at [i] in memory compared to the address at [i] on file is always different. I know I have to be doing something wrong syntax wise. Any help would be greatly appreciated.

Share this post


Link to post
Share on other sites
Does it happen literally for every single byte? Because there's a ton of stuff that's within the PE (EXE/DLL) file space once it gets loaded into memory that will be different:

- Values of global/static variables (these change during the normal course of execution)

- Values of the placeholders in the import address table (these change during load time)

- Any other non-relative addresses that are remapped (possibly switch tables) (these should change during load time)

- Maybe some other stuff; I haven't analyzed DLLs as much as EXEs.

Share this post


Link to post
Share on other sites
Assuming this is for some sort of hacking protection, you may be better comparing just the code segments of the DLL - although I'm not sure if that could change during the load process too. For details of that, you'd want to look into the PE File Format (There may be better sources elsewhere though).

Share this post


Link to post
Share on other sites
Actually I went and logged it, and here's what I got.

for (int i = 0;i < lSize;i++)
{
if (moduleAddress[i] != physicalAddress[i])
{
writelog("%s : Offset 0x%x Physical: 0x%x Memory: 0x%x\n",szModuleName,i,physicalAddress[i],moduleAddress[i]);

//g_intInsecurity.push_back(i);
}
else
{
writelog("%s : Offset 0x%x Match 0x%x\n",szModuleName,i,physicalAddress[i]);
}
}


And the output log...
http://www.di-labs.com/memorycompare_example.txt

Share this post


Link to post
Share on other sites
Offset in DLL file is different from virtual address. Basically a PE file (which includes DLLs) consists of sections, each section loads at different address and that information can be found in the file header. Here's some information about PE file format.

Share this post


Link to post
Share on other sites
From what I've been reading alignment in the virtual addresses, and the alignment of the offsets on file differ. Now does anyone have any links for tutorials or research papers on how to properly align PE files and PE in loaded memory, in order to gain a good comparison?

Share this post


Link to post
Share on other sites
You're going to have to write a PE module parser, because DLLs have relocation sections which modify the code depending on where it is loaded, and even then you can only compare the memory segment.

Honestly, if you just want to make sure the file hasn't been modified, you're better off using some kind of machine code obfuscator. It's trivial to reverse engineer the kind of check you're trying to write and make it refer to some other file (like the original dll renamed to "original.dll.org") instead of the file you want it to refer to.

You can't have very good anti-hacker "security" at all in software - it simply isn't possible to thwart talented reverse engineers, and anybody that does reverse engineering is going to be talented because it's not something that everyday people do. The closest thing to an amateur reverse engineer is a student doing something for class, but they're not going to hacking your program anyways so you don't need to worry about them.

If you still insist on doing basic checks for whatever reason, then you absolutely must learn how DLLs are loaded and processed by the system, and how reverse engineering and debugging is done. Two starting places: What Goes On Inside Windows 2000: Solving the Mysteries of the Loader; Windows Anti-Debug Reference.

Another simple trick is using the process and thread permission system implemented in windows - the common trick (by cheaters far more than by reverse engineers) of using WriteProcessMemory can be somewhat circumvented by having the process revoke all permissions to it's token at the very start of the program, which can make it difficult for the cheater to get it's process permission to open the process with write priveleges. In order to take advantage of that, though, you'll need to have the main executable launch a copy of itself using CreateProcess and pass the "CREATE_BREAKAWAY_FROM_JOB | DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP" with the appropriate permissions. Another simple way is to make a number class that stores important numeric values in two separate variables. For example, for a 32-bit int, you'd store it as one 32-bit int that is a random number and then all accesses to the value would be based on the random number XOR an appropriate value to get the final value. That way, searching memory for a specific number will never find it, and actual reverse engineering is required.

[Edited by - Extrarius on November 25, 2008 11:25:03 AM]

Share this post


Link to post
Share on other sites
Points to Extrarious for the link to the loader article! (That's the first thing I thought of when I saw the thread title). Points to muse1987 as well. Pietrek owns spelunking the PE File.

There's more to be gleaned via wikipedia: Portable Executable. There are few links here that aren't there. There's info about debugging info here. There are additional links here.

Share this post


Link to post
Share on other sites
Let's not forget that Windows is going to dynamically rebase DLLs as necessary, plus Vista may apply ASLR to the thing. That's going to alter all sorts of addresses, even within the code segment.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
Let's not forget that Windows is going to dynamically rebase DLLs as necessary, plus Vista may apply ASLR to the thing. That's going to alter all sorts of addresses, even within the code segment.
Is it? Surely only the jump table for DLL imports will be patched, and I didn't think that was in the code segment (Or if it is, it's probably possible to find the address of it and ignore it).

Share this post


Link to post
Share on other sites
Rebasing Win32 DLLs: The Whole Story under Fixups:
Quote:
the DLL loader will allocate the string "Name" in the DLL's data segment and fill the beginning address of that string into the location that corresponds to the variable lpName. If the string "Name" must be relocated because the DLL could not be loaded at its base address, lpName must be updated accordingly. Note that in this case, every reference to lpName from within the code must also be fixed up.

Objects that can be subject to relocation are literal strings (for example, the string "Name" in the example above), as well as global and static data of every type, including statically allocated C++ objects. Note that especially in C++ there may be many hidden cross-references from one static object to another. Uninitialized data will (trivially) not be fixed up during the relocation process, but references to uninitialized static data will.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
Rebasing Win32 DLLs: The Whole Story under Fixups:
Quote:
the DLL loader will allocate the string "Name" in the DLL's data segment and fill the beginning address of that string into the location that corresponds to the variable lpName. If the string "Name" must be relocated because the DLL could not be loaded at its base address, lpName must be updated accordingly. Note that in this case, every reference to lpName from within the code must also be fixed up.

Objects that can be subject to relocation are literal strings (for example, the string "Name" in the example above), as well as global and static data of every type, including statically allocated C++ objects. Note that especially in C++ there may be many hidden cross-references from one static object to another. Uninitialized data will (trivially) not be fixed up during the relocation process, but references to uninitialized static data will.
Interesting, I didn't consider that...

Share this post


Link to post
Share on other sites
what I'm trying to do is find the .code section of a PE file, when it's in it's file form. Once a PE file get's loaded into memory there is an effect called padding that throws off the alignment of the PE format, so you can't compare the same byte in memory at the address, to the same offset in the file.

this is the code I used and what happened when I tried to go module[i] physical[i] and compare.


void SecureModules ()
{
for (int z = 0;(int)g_moduleList.size() > z;z++)
{
HMODULE hMemMod = (HMODULE)g_moduleList.at(z).module;
if (IsConcernedModule(hMemMod))
{
char pFileName[260];
if (GetModuleFileNameExA(GetCurrentProcess(),hMemMod,pFileName,_MAX_PATH) > 0)
{
if (hMemMod != GetModuleHandleA("V3_GameSecureC_b1.dll"))
continue;

PIMAGE_DOS_HEADER pDOSHeader;
PIMAGE_NT_HEADERS pPEHeader;
PIMAGE_OPTIONAL_HEADER pOptionalHeader;
void * l_pCodeBase, * l_pDataBase;
unsigned long l_nCodeSize, l_nDataSize;

pDOSHeader = ( PIMAGE_DOS_HEADER ) hMemMod;
pPEHeader = ( PIMAGE_NT_HEADERS ) ( ( unsigned long ) hMemMod + ( pDOSHeader->e_lfanew ) );
pOptionalHeader = & pPEHeader->OptionalHeader;

l_pCodeBase = ( void * ) ( ( ( unsigned long ) hMemMod ) + pOptionalHeader->BaseOfCode );
l_nCodeSize = pOptionalHeader->SizeOfCode;
l_pDataBase = ( void * ) ( ( ( unsigned long ) hMemMod ) + pOptionalHeader->BaseOfData );
l_nDataSize = ( pOptionalHeader->SizeOfUninitializedData + pOptionalHeader->SizeOfInitializedData );

FILE* fpPhysicalModule;
char szModuleName [ MAX_PATH ]; ( * szModuleName ) = 0;
long lSize;
char* pbPhysModule;
size_t numBytes;

fpPhysicalModule = fopen(pFileName,"rb");
if (!fpPhysicalModule)
{
g_intInsecurity.push_back(3000+z);
return;
}
fseek(fpPhysicalModule, 0, SEEK_END);
lSize = ftell(fpPhysicalModule);
rewind(fpPhysicalModule);
pbPhysModule = (char*)malloc(sizeof(char)*lSize);
numBytes = fread(pbPhysModule, 1, lSize,fpPhysicalModule);
fclose(fpPhysicalModule);
GetModuleBaseNameA(GetCurrentProcess(),(HMODULE)g_moduleList.at(z).module,szModuleName,sizeof(szModuleName));
DWORD dwMemModule = (DWORD)GetModuleHandleA(szModuleName);
BYTE* moduleAddress = (BYTE*)dwMemModule;

BYTE* physicalAddress = (BYTE*)pbPhysModule;

for (int i = 0;i < l_nCodeSize;i++)
{
if (moduleAddress[i] != physicalAddress[i])
{
writelog("%s : Offset 0x%x Physical: 0x%x Memory: 0x%x\n",szModuleName,i,physicalAddress[i],moduleAddress[i]);

//g_intInsecurity.push_back(i);
}
else
{
writelog("%s : Offset 0x%x Match 0x%x\n",szModuleName,i,physicalAddress[i]);
}
}

}
}
}
}


Now the output was pretty disgusting.. http://www.di-labs.com/memorycompare_example.txt

Ok so I went on reading why this happens, and got my answer from... http://msdn.microsoft.com/en-us/library/ms809762.aspx

Apparently it gets aligned differently.. So what I need to do is find the .code address on file and I'm set. I can already get the .code address in memory easily.


l_pCodeBase = ( void * ) ( ( ( unsigned long ) hMemMod ) + pOptionalHeader->BaseOfCode );


Now I went on reading more and I found this interesting article... http://webster.cs.ucr.edu/Page_TechDocs/pe.txt
if you go down to the section where it says.. well hell, I'll post it for you.

Relative Virtual Addresses
--------------------------

The PE format makes heavy use of so-called RVAs. An RVA, aka "relative
virtual address", is used to describe a memory address if you don't know
the base address. It is the value you need to add to the base address to
get the linear address.
The base address is the address the PE image is loaded to, and may vary
from one invocation to the next.

Example: suppose an executable file is loaded to address 0x400000 and
execution starts at RVA 0x1560. The effective execution start will then
be at the address 0x401560. If the executable were loaded to 0x100000,
the execution start would be 0x101560.

Things become complicated because the parts of the PE-file (the
sections) are not necessarily aligned the same way the loaded image is.
For example, the sections of the file are often aligned to
512-byte-borders, but the loaded image is perhaps aligned to
4096-byte-borders. See 'SectionAlignment' and 'FileAlignment' below.

So to find a piece of information in a PE-file for a specific RVA,
you must calculate the offsets as if the file were loaded, but skip
according to the file-offsets.
As an example, suppose you knew the execution starts at RVA 0x1560, and
want to diassemble the code starting there. To find the address in the
file, you will have to find out that sections in RAM are aligned to 4096
bytes and the ".code"-section starts at RVA 0x1000 in RAM and is 16384
bytes long; then you know that RVA 0x1560 is at offset 0x560 in that
section. Find out that the sections are aligned to 512-byte-borders in
the file and that ".code" begins at offset 0x800 in the file, and you
know that the code execution start is at byte 0x800+0x560=0xd60 in the
file.

Then you disassemble and find an access to a variable at the linear
address 0x1051d0. The linear address will be relocated upon loading the
binary and is given on the assumption that the preferred load address is
used. You find out that the preferred load address is 0x100000, so we
are dealing with RVA 0x51d0. This is in the data section which starts at
RVA 0x5000 and is 2048 bytes long. It begins at file offset 0x4800.
Hence. the veriable can be found at file offset
0x4800+0x51d0-0x5000=0x49d0.


Now here's where my math dyslexia comes in. Anybody can help me?

Share this post


Link to post
Share on other sites
Quote:
Original post by RMarin
[...]
Just to let you know, were I so inclined, I'd bypass your module protection by modifying a single byte in the executable, which would change
void SecureModules()
{
//...
//...
//...
}
into
void SecureModules(){}
If you want to thwart anybody at all, you'll need something a bit more complicated, like writing your own loader that decrypts the dll into memory and then loads it. That would still be simple to circumvent, but not nearly so simply as adding a 'return' to the beginning of a function. If I couldn't modify your program for some reason, I would inject a thread would remove 'concerned modules' from the module list in the process environment block, or I would inject code to make GetModuleFileNameExA fail with particular modules, or make the file open attempt fail, or I would simply two copies of the DLL - one original, that would be in the module list (with a modified export table) and one modified that would be the one everything would actually call. If you also checked the export table, I would patch the import table of the application instead, etc etc etc. This approach is not conducive to much more than wasted time on your part.

One approach that would help your cause would be to use the debug database (the pdb file) to know the boundaries of functions, and then you could compute a checksum of each in-memory function (being careful to account for relocations one way or another) and you could hard-code values into your executable that would calculate constants in your program based on those checksums. Doing so would require patching your executable after each compilation, but would still be very simple.

To get fancy, you could use the debug database along with guard pages and a custom exception handler to ensure that no more than a single page is decrypted at any time. For another layer, you could ensure that all the protected pages are executed at the same address, so simply capturing each page after it is decrypted won't be enough to run the whole function. In order for this to not bring execution to a virtual halt, you'll want to be really careful about loops extending across page boundaries. You might want to make the protection keep two unencrypted pages, for example, to help with large loops.

To do anything that remotely deters anybody, though, you're going to have to study the subject of reverse engineering in depth. It's good that you've started learning the PE format, but you really need to be able to do much more than that. At the very least, you should be able to defeat your own protection measures using (for example) the freeware version of IDA Pro.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this