• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
codingJoe

Function addresses are not fixed relative to each other?

14 posts in this topic

Hello,

 

while trying to find out the reasons of a strange behaviour of my library, I stumbled upon this:

 

cpp file 1 of library:

void printRelativeFunctionAddresses()
{
    printf("printRelativeFunctionAddresses-giveMeAOne: %i\n",int(printRelativeFunctionAddresses)-int(giveMeAOne));
    printf("printRelativeFunctionAddresses-giveMeATwo: %i\n",int(printRelativeFunctionAddresses)-int(giveMeATwo));
}

Cpp file 2 of library:

extern "C" __attribute__((visibility("default"))) int giveMeAOne()
{
    return(1);
}
extern "C" __attribute__((visibility("default"))) int giveMeATwo()
{
    return(2);
}

the function printRelativeFunctionAddresses prints the same value everytime I start the library from a specific client application A. But when I use a different client application B (but that loads the exact same library), then the output is different! How can that be? Are functions shifted around during the library load operation?

 

I am running the library on Ubuntu 32bit, using Qt5.1.0

0

Share this post


Link to post
Share on other sites

Thank you swiftcoder.

 

Yes, I understand that the functions should all get a same offset, but is it possible that one function gets a different offset that the others?

 

Let's say, that I have 2 functions in my library: A() and B().

My client application loads the library and binds the functions. I print location of function B RELATIVE to the location of functions A:

 

A-B=position of function B relative to function A=X

 

My question is: will X always be the same for the same library binary? Could it be that the library loaded decided one time to put function B before function A? (and vice-versa)

0

Share this post


Link to post
Share on other sites

My guess would be that is pretty much up to the implementation of your platform's dynamic linker.

 

In general, A and B will always have the same offset vis-à-vis one another, but I don't know that behaviour is actually guaranteed anywhere.

1

Share this post


Link to post
Share on other sites

It's normal module loading behavior. Within the file on the disk there's a table that lists the location within the file for all the relocatable assets (import descriptor table). When the module gets loaded, whether it's a dll (or whatever) or an exe, the system spreads things out in a way that makes access more convenient for runtime use, then it recalculates the addresses and updates all the references in the table to point to the memory location within the module's code page.

 

This article is outdated, but - to my knowledge - there haven't been any really major changes:

 

http://msdn.microsoft.com/en-us/magazine/cc301727.aspx

2

Share this post


Link to post
Share on other sites

Thanks Khatharr,

 

I had a look at the link which is very exhaustive, but probably too much Windows oriented.

I am using g++ V4.6.3 under Ubuntu. It is invoked under the whoods by Qt (Qt V5.1.0).

You are telling me that functions can get shifted around when loaded into memory, and the loader/binder takes care of correcting for that. But it seems that in my case, the loader fails at doing its job correctly:

 

I have library-internal calls to functions A and B too. And those calls will sometimes call the wrong address (as strangely as it seems!). My library and client application run on Windows, Mac and Linux. Windows and Mac run fine. Linux used to run fine until I switched to the newest Qt version (but still same compiler). Is it possible that Qt appended additional compilation flags that influence the way my library gets loaded?

0

Share this post


Link to post
Share on other sites

 


Are functions shifted around during the library load operation?

Yes. The correct term is 'relocated'.

 

If libraries couldn't be loaded at arbitrary base addresses, then every library in the world would need a unique base address (and we'd run out of addresses).

 

 

This is also a security measure in most OSes the dynamic base makes it harder for buffer overflows or underflows to be detected and exploited. This is not the end all in security and is very easily broken but it is an additional step that make the rest of the security system better.

 

Also it is not a good idea to assume fp are always relative to a certain location in code itself, as adding additional code could also move this stuff around. It is better to ask for the function pointer address from the module you are loading in, this way it will always link to the correct version.
 

1

Share this post


Link to post
Share on other sites

AFAIK, it can be noted that Linux uses ASLR by default (ASLR = Address Space Layout Randomization).

 

where basically, not only does where the library get loaded in RAM vary, but also its exact starting address within a page (the loader may invoke a random number generator in its operation). this makes buffer-overflow exploits harder as function addresses are much more random (and thus harder to guess).

 

note that you don't just want to randomize absolute addresses, but also relative addresses, since most ISAs use relative addresses for jumps/..., making the relative address fairly relevant as well.

 

it is possible that the linker may also perform randomization (causing different builds of a program or library to have different layouts). this behavior depends some on the flags used during compilation.

 

note that it isn't necessarily the case that when getting function pointers within C or C++ code that one will necessarily get the "true" function pointer either. sometimes, a person may instead get what is known as a "trampoline", a function pointer that simply points to a "jmp" instruction or similar which goes somewhere else (sometimes these will be put in as part of the compilation or linking process).

2

Share this post


Link to post
Share on other sites

I have library-internal calls to functions A and B too. And those calls will sometimes call the wrong address (as strangely as it seems!). My library and client application run on Windows, Mac and Linux. Windows and Mac run fine. Linux used to run fine until I switched to the newest Qt version (but still same compiler). Is it possible that Qt appended additional compilation flags that influence the way my library gets loaded?

 

Are you sure you fully recompiled everything?

1

Share this post


Link to post
Share on other sites

I'm only familiar with how this works in Windows PE (exe and dll) files:

 

- The assembler and linker will organize machine code however it feels like it.  Typically a compiler will generate each function as they appear in the source file and the linker will generate a library in the order that objects are passed to it - this is mainly subject to change due to optimizations.

- The output object/library/executable files are partitioned into 'sections'.  Each section can be relocated anywhere in RAM when the loader loads it.

- Once the exe or dll is generated, the code within each section of that file will assume that the section will be treated as an immutable chunk.  Function calls within one section rely on this.

- A relocation table can be used to remap pointers within the code sections (typically addresses to other sections within the same file), but this never changes the relative offsets of things within the same section.

 

You should expect code within one section to get the same relative offsets to each other, but only until you recompile or re-link your exe/dll.  It will not change between runs of the program.  You should never rely on cross-section offsets being the same, OR absolute addresses being the same between runs of the same app, even if you did not recompile.

 

 

I've never inspected Linux binaries closely to see if they use this same pattern or not.

Edited by Nypyren
2

Share this post


Link to post
Share on other sites

AFAIK, it can be noted that Linux uses ASLR by default (ASLR = Address Space Layout Randomization).

 

where basically, not only does where the library get loaded in RAM vary, but also its exact starting address within a page (the loader may invoke a random number generator in its operation). this makes buffer-overflow exploits harder as function addresses are much more random (and thus harder to guess).

 

note that you don't just want to randomize absolute addresses, but also relative addresses, since most ISAs use relative addresses for jumps/..., making the relative address fairly relevant as well.

 

it is possible that the linker may also perform randomization (causing different builds of a program or library to have different layouts). this behavior depends some on the flags used during compilation.

 

note that it isn't necessarily the case that when getting function pointers within C or C++ code that one will necessarily get the "true" function pointer either. sometimes, a person may instead get what is known as a "trampoline", a function pointer that simply points to a "jmp" instruction or similar which goes somewhere else (sometimes these will be put in as part of the compilation or linking process).

This is on in Windows 7 and up as well, but ASLR is easy to break when the memory address space is nearly completely used and that is why this stuff is only seen as a first line defense.

0

Share this post


Link to post
Share on other sites

I'm only familiar with how this works in Windows PE (exe and dll) files:

 

- The assembler and linker will organize machine code however it feels like it.  Typically a compiler will generate each function as they appear in the source file and the linker will generate a library in the order that objects are passed to it - this is mainly subject to change due to optimizations.

- The output object/library/executable files are partitioned into 'sections'.  Each section can be relocated anywhere in RAM when the loader loads it.

- Once the exe or dll is generated, the code within each section of that file will assume that the section will be treated as an immutable chunk.  Function calls within one section rely on this.

- A relocation table can be used to remap pointers within the code sections (typically addresses to other sections within the same file), but this never changes the relative offsets of things within the same section.

 

You should expect code within one section to get the same relative offsets to each other, but only until you recompile or re-link your exe/dll.  It will not change between runs of the program.  You should never rely on cross-section offsets being the same, OR absolute addresses being the same between runs of the same app, even if you did not recompile.

 

 

I've never inspected Linux binaries closely to see if they use this same pattern or not.

 

it is the same basic pattern, yes.

 

there are possible differences and things that could be clarified, but I decided against going too much into the specifics of PE/COFF and ELF loaders.

 

 

the main difference is mostly that, as applicable, the compiler (GCC) may shuffle things around and link objects in a pseudo-random order.

this doesn't usually change much between runs of a program though, but may effect things between builds.

 

the assumption in Linux land is generally that people will be regularly recompiling pretty much everything, rather than keeping the same binary around for a decade or more.

 

the rest is mostly randomizing the load address for a given image, so at one time it may be loaded at one address, and at another time another.

if the functions are not necessarily in the same image, then they will vary.

 

also it may happen that if you try to fetch a function pointer to a function in a different library, you will not get its true address.

partly it has to do with lazy linking:

the GOT holds an initialization stub, which when called will replace itself with the appropriate function address.

 

so, it isn't a good idea to return the address directly from the GOT, since if the function hasn't yet been called, it may be the wrong address (pointing to the stub, rather than the target function).

so, instead, what will be returned is a function pointer to a trampoline, which will jump to the address in the GOT.

this way, when the stub is called it can do its thing, and the function pointer will go to the right place.

 

PE/COFF will often also use trampoline stubs, mostly as a means to allow the compiler to more easily use cheap local calls when possible (at a slight added cost for the case where the function turns out to be a DLL import).

 

also, ELF will tend to use the GOT as a means of avoiding need for explicit relocation tables, allowing libraries to be mapped to arbitrary addresses without needing to be rebased, and allowing more pages to be shared. but, again, this comes at a slight cost to performance (pretty much everything is done indirectly), and typically keeping a register tied up with keeping track of the GOT.

 

in contrast, with PE/COFF, the solution is basically "try when possible to always map the DLL to the same base address in every process".

 

 

I personally like the PE/COFF strategy a little more, FWIW.

also, the addition of RIP-relative addressing on x64 greatly reduces the need for explicit relocations, while still allowing fast/cheap access to local variables and functions.

 

though, ELF has a few things it did well as well, FWIW...

Edited by cr88192
2

Share this post


Link to post
Share on other sites

When you cast your function pointers to integers, I'd always preclude that with a static-assertion testing that the size of your function pointer type is smaller or equal to the size of your int type.

 

Of course, if I really have to use printf to output any kind of pointer, I'd just use %p in the first place...

1

Share this post


Link to post
Share on other sites

Thanks for all the interesting inputs! It was for me a little bit too much in-depth though ;)

 

Returning to my initial problem, I would be very curious if someone finds out what is going on because it is extremely mysterious to me:

 

1) I am not doing any fancy things. Specially not memory manipulation, etc.

2) The strange thing only happens on Linux. Windows and Mac run fine.

3) The application is cross-platform and based on Qt.

4) When working with Qt4.8, the problem doesn't happen on Linux. With Qt5.1.0, yes, even if the compiler is the same g++ 4.6.3

5) And here the actual mysterious thing:

 

a) I have a library LIB that exports several API functions. I also have a client application APP that dynamically loads the LIB and binds ALL of its functions at once.

b) Internally, the LIB itself calls some of its exported API functions

c) If APP calls the API function X, or rather just "mentions" a call to the API function X (i.e. the function doesn't need to be executed), then the LIB cannot call the same function, because the function pointer is invalid for the LIB (but working for the APP)

d) if a second APP, say APP2 doesn't call and doesn't "mention" a call to the API function X, then the LIB can call the function X and it works (i.e. no crash with APP2)

e) between c) and d) the LIB has not changed. It is exactly the same file. I verified also that the crash is linked to the function pointer X being invalid internally

 

To summarize above explanations, here a typical example:

 

LIB, file1:

void randomFunction()
{
    printHello(); // internally calling the API function that this LIB exports
}

LIB, file2:

extern "C" __attribute__((visibility("default"))) void printHello()
{ // one of the functions that this LIB exports
    printf("Hello\n");
}

APP:

int main(int argc,char* argv[])
{
    // Dynamically load LIB with: lib=dlopen(pathAndFilename,RTLD_LAZY)
    // Bind ALL API functions with: dlsym(lib,funcName);
    for (int i=0;i<10000000000;i++)
        printf("."); // just spend "almost" infinite time here
    printHello(); // this is actually never executed. But if we remove the loop here above, this function would execute correctly
}

APP2:

int main(int argc,char* argv[])
{
    // Dynamically load LIB with: lib=dlopen(pathAndFilename,RTLD_LAZY)
    // Bind ALL API functions with: dlsym(lib,funcName);
    for (int i=0;i<10000000000;i++)
        printf("."); // just spend "almost" infinite time here
    // printHello(); 
}

The difference between APP and APP2 is minimal. The LIB that APP and APP2 load and bind is exactly the same. However, only with APP2 can the LIB function 'randomFunction' safely call 'printHello'. With APP, 'randomFunction' executes only to the point where 'printHello' has to be called, then crashes. Verifying the 'printHello' function address inside that 'randomFunction' reveals an illegal address --> it is logical that it crashes, but not logical that the address is illegal with APP and not illegal with APP2.

 

Did I make sense? I am already trying to figure out what is going on since a few days, and the more I move forward, the more I feel lost..

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0