Library function also used internally: segmentation fault on Linux

Started by
6 comments, last by Matt-D 10 years, 9 months ago

Hello,

I am using Qt 5.1.0, but I guess this is not directly related to the use of Qt.

I have a library which in fact is my application. It exports many API functions. Some of those API functions are also called internally (from the library itself). That worked fine on Windows, Mac and Linux. But changing the compiler version on Linux, has caused problems. I have following API functions (example):


extern "C" __attribute__((visibility("default"))) int returnMeAZero()
{
    return(0);
}

extern "C" __attribute__((visibility("default"))) int returnMeAOne()
{
    return(1);
}

I have a client application that loads the library and dynamically binds above 2 functions with dlopen and dlsym. That client application (an executable), when started, calls "returnMeAZero" which works fine. Internally, the library sometimes calls "returnMeAOne" which works also fine. But when the library internally calls "returnMeAZero" it crashes with a segmentation fault.

Above example is simplified. There are several hundreds of functions similar to those above. And only those that are references in the client source code cannot be called internally by the library. That seems very strange to me. Is there any logical explanation for that behaviour?

Advertisement

Does it only happen when calling a function that returns zero? It doesn't happen when you call a function that returns non-zero?

Also, can you post a minimal, complete example of both the application and library that will reproduce this issue?

[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]

Thanks for the quick reply!

Those functions were just examples. But basically when it crashes, it does it before entering the functions (checked with prints). Here an example a little bit more elaborate:

Library code, file 1:


extern "C" __attribute__((visibility("default"))) int returnMeAZero()
{
    return(0);
}
extern "C" __attribute__((visibility("default"))) int returnMeAZero2()
{
    return(0);
}
extern "C" __attribute__((visibility("default"))) int returnMeAOne()
{
    return(1);
}

Library code, file 2:


void randomFunction()
{
    returnMeAOne(); // works and doesn't crash
    returnMeAZero2(); // works and doesn't crash
    returnMeAZero(); // crashes before even entering the 'returnMeAZero' function
}

Client application:


void main()
{
    // load library with dlopen
    // bind all 3 API functions in exactly the same way (with dlsym)
    returnMeAZero(); // works without crash
}

If I remove the 'returnMeAZero' function call in the client application, then the 'randomFunction' of the library never crashes and does exactly what it is supposed to. The whole thing behaves as if just having a function call to that function in the client application changes the physical location of the function in the library!

When you're checking with prints, are you sure to flush the stream? Does it do this only with release builds or with debug builds? Is your program 32-bit or 64-bit? Is your library 32-bit or 64-bit? Is this x86?

The reason I wanted a complete example was so I could copy 'n' paste the code and run it with my debugger and carefully watch what's happening. If it's not a 100% complete (minimal) example I can't be sure you and I aren't running different code. If you can create a complete, runnable mini sample that consistently crashes for you, I'll test it on my machines.

(also, be sure to use int main() instead of void main(). void main() is invalid C (and C++) and should only ever be used on a freestanding implementation)

[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]

Thanks again for your reply.

Well, the project is pretty big, and cutting out everything would take time, but I will consider this.

I tested a last thing in the mean time. With above mentionned piece of example code, I tried this:

In the library:


printf("giveMeAZero (internal): %i\n",giveMeAZero);
printf("giveMeAZero2 (internal): %i\n",giveMeAZero2);
printf("giveMeAOne (internal): %i\n",giveMeAOne);

In the client application:


printf("giveMeAZero (clientApp): %i\n",giveMeAZero);
printf("giveMeAZero2 (clientApp): %i\n",giveMeAZero2);
printf("giveMeAOne (clientApp): %i\n",giveMeAOne);

And the function addresses do not overlap! How can that be?

The library and client applications are both 32 bit, running on Ubuntu12.04 32bit in a virtualBox

The linker can and will merge functions which have an identical sequence of assembly instructions into one function. This is a good thing because it reduces code size, with templates that code size reduction can be significant.

However, it sounds like that in this case that merging is having an unwanted side effect. You might be able to find a linker setting to switch it off. What are your linker command lines?

Thank you for your explanation.

But the linker does its job just after compilation, right? then a library is generated.

In my case, with the exact same library that internally calls giveMeAZero, I can have two different scenarios:

1) client app also calls giveMeAZero --> when the library itself calls giveMeAZero, I get a segmentation fault

2) client app doesn't call giveMeAZero (but binds the function) --> when the library itself calls giveMeAZero, it works!

How can that be??

Could you post stacktrace from gdb?

// http://sourceware.org/gdb/onlinedocs/gdb/Backtrace.html

// http://en.wikipedia.org/wiki/GNU_Debugger#An_example_session

In addition try also stepping through your program (using "start" and "next") to see precisely where does the crash occur.

This topic is closed to new replies.

Advertisement