Function addresses are not fixed relative to each other?

Started by
13 comments, last by codingJoe 10 years, 9 months ago

Hello,

while trying to find out the reasons of a strange behaviour of my library, I stumbled upon this:

cpp file 1 of library:


void printRelativeFunctionAddresses()
{
    printf("printRelativeFunctionAddresses-giveMeAOne: %i\n",int(printRelativeFunctionAddresses)-int(giveMeAOne));
    printf("printRelativeFunctionAddresses-giveMeATwo: %i\n",int(printRelativeFunctionAddresses)-int(giveMeATwo));
}

Cpp file 2 of library:


extern "C" __attribute__((visibility("default"))) int giveMeAOne()
{
    return(1);
}
extern "C" __attribute__((visibility("default"))) int giveMeATwo()
{
    return(2);
}

the function printRelativeFunctionAddresses prints the same value everytime I start the library from a specific client application A. But when I use a different client application B (but that loads the exact same library), then the output is different! How can that be? Are functions shifted around during the library load operation?

I am running the library on Ubuntu 32bit, using Qt5.1.0

Advertisement


Are functions shifted around during the library load operation?

Yes. The correct term is 'relocated'.

If libraries couldn't be loaded at arbitrary base addresses, then every library in the world would need a unique base address (and we'd run out of addresses).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Thank you swiftcoder.

Yes, I understand that the functions should all get a same offset, but is it possible that one function gets a different offset that the others?

Let's say, that I have 2 functions in my library: A() and B().

My client application loads the library and binds the functions. I print location of function B RELATIVE to the location of functions A:

A-B=position of function B relative to function A=X

My question is: will X always be the same for the same library binary? Could it be that the library loaded decided one time to put function B before function A? (and vice-versa)

My guess would be that is pretty much up to the implementation of your platform's dynamic linker.

In general, A and B will always have the same offset vis-à-vis one another, but I don't know that behaviour is actually guaranteed anywhere.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

It's normal module loading behavior. Within the file on the disk there's a table that lists the location within the file for all the relocatable assets (import descriptor table). When the module gets loaded, whether it's a dll (or whatever) or an exe, the system spreads things out in a way that makes access more convenient for runtime use, then it recalculates the addresses and updates all the references in the table to point to the memory location within the module's code page.

This article is outdated, but - to my knowledge - there haven't been any really major changes:

http://msdn.microsoft.com/en-us/magazine/cc301727.aspx

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

Thanks Khatharr,

I had a look at the link which is very exhaustive, but probably too much Windows oriented.

I am using g++ V4.6.3 under Ubuntu. It is invoked under the whoods by Qt (Qt V5.1.0).

You are telling me that functions can get shifted around when loaded into memory, and the loader/binder takes care of correcting for that. But it seems that in my case, the loader fails at doing its job correctly:

I have library-internal calls to functions A and B too. And those calls will sometimes call the wrong address (as strangely as it seems!). My library and client application run on Windows, Mac and Linux. Windows and Mac run fine. Linux used to run fine until I switched to the newest Qt version (but still same compiler). Is it possible that Qt appended additional compilation flags that influence the way my library gets loaded?


Are functions shifted around during the library load operation?

Yes. The correct term is 'relocated'.

If libraries couldn't be loaded at arbitrary base addresses, then every library in the world would need a unique base address (and we'd run out of addresses).

This is also a security measure in most OSes the dynamic base makes it harder for buffer overflows or underflows to be detected and exploited. This is not the end all in security and is very easily broken but it is an additional step that make the rest of the security system better.

Also it is not a good idea to assume fp are always relative to a certain location in code itself, as adding additional code could also move this stuff around. It is better to ask for the function pointer address from the module you are loading in, this way it will always link to the correct version.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

AFAIK, it can be noted that Linux uses ASLR by default (ASLR = Address Space Layout Randomization).

where basically, not only does where the library get loaded in RAM vary, but also its exact starting address within a page (the loader may invoke a random number generator in its operation). this makes buffer-overflow exploits harder as function addresses are much more random (and thus harder to guess).

note that you don't just want to randomize absolute addresses, but also relative addresses, since most ISAs use relative addresses for jumps/..., making the relative address fairly relevant as well.

it is possible that the linker may also perform randomization (causing different builds of a program or library to have different layouts). this behavior depends some on the flags used during compilation.

note that it isn't necessarily the case that when getting function pointers within C or C++ code that one will necessarily get the "true" function pointer either. sometimes, a person may instead get what is known as a "trampoline", a function pointer that simply points to a "jmp" instruction or similar which goes somewhere else (sometimes these will be put in as part of the compilation or linking process).


I have library-internal calls to functions A and B too. And those calls will sometimes call the wrong address (as strangely as it seems!). My library and client application run on Windows, Mac and Linux. Windows and Mac run fine. Linux used to run fine until I switched to the newest Qt version (but still same compiler). Is it possible that Qt appended additional compilation flags that influence the way my library gets loaded?

Are you sure you fully recompiled everything?

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

I'm only familiar with how this works in Windows PE (exe and dll) files:

- The assembler and linker will organize machine code however it feels like it. Typically a compiler will generate each function as they appear in the source file and the linker will generate a library in the order that objects are passed to it - this is mainly subject to change due to optimizations.

- The output object/library/executable files are partitioned into 'sections'. Each section can be relocated anywhere in RAM when the loader loads it.

- Once the exe or dll is generated, the code within each section of that file will assume that the section will be treated as an immutable chunk. Function calls within one section rely on this.

- A relocation table can be used to remap pointers within the code sections (typically addresses to other sections within the same file), but this never changes the relative offsets of things within the same section.

You should expect code within one section to get the same relative offsets to each other, but only until you recompile or re-link your exe/dll. It will not change between runs of the program. You should never rely on cross-section offsets being the same, OR absolute addresses being the same between runs of the same app, even if you did not recompile.

I've never inspected Linux binaries closely to see if they use this same pattern or not.

This topic is closed to new replies.

Advertisement