Sign in to follow this  

Some tips on multi-platform debugging?

This topic is 3845 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, There's a bug in our system that I need to squash but I don't know where to start. If someone could lend some insight it'd be very helpful. The problem is the package works perfectly on Windows, Linux, and Unix machines. The only machines that it doesn't work on are Sun Solaris Workstations. Are there some common problems that I should begin by looking at? I come from a Java and a single-platform C++ background, so it's hard for me to imagine why code that works on one machine won't work on another. Thanks a lot for your help and expertise -Cuppo

Share this post


Link to post
Share on other sites
Well, the specifically the program is crashing. The error message is generated by the one of the files in the package, so it's meaningless unless you have the code.

It's quite an old piece of software so it's a jumble of languages.
We're using, Fortran, C, C++, Perl, SQL, Oracle, and some Unix makefile scripts.

Share this post


Link to post
Share on other sites
we are going to need more to be able to help you out at all. Basically you are saying that this huge codebase is crashing somewhere and that you get an error message, but its useless... its like someone sending a postcard to Sherlock Holmes saying "Mr Sherlock, this chap got murdered. we don't know his name, we can't tell you where he is or what he was doing... please put down your pipe and help us out!"

Try and narrow down the problem. You know where its crashing, put a breakpoint in, do some debug variable printing etc, you have to give us more to go on.

Share this post


Link to post
Share on other sites
Yeah..i realize that my post is very very vague.
Problem is: I'm having trouble just finding the bug, so I need some help on what to look for.

So what I want to know is:
Are there some common lines of code that cause a program to crash on one architecture but not another?

Thanks for trying to help me.

Share this post


Link to post
Share on other sites
In my book, if you are getting an error message you are not really crashing. that is controlled shutdown (although, undesired). This should give you a starting-point since you know that its broken at that error message printing. Work your way backwards from that point.

Actually, start by figuring out in what language its crashing (since you have a whole heap). Who is printing that error message? Its impossible to give any tips on what might cause crashes if you don't know the language.

Share this post


Link to post
Share on other sites
Quote:
So what I want to know is:
Are there some common lines of code that cause a program to crash on one architecture but not another?


Yes.

Every line of code.

In some cases, even valid code can crash not only the application, but system as well, since the compiler generates invalid code.

Even portable languages aren't immune to crashes on various platforms (yes Java, looking at you). The mash up you have has 0 portability in any part.

If this crash is due to buggy code, you're home free. Just fix the bug.
If this crash is due to library, compiler or platform-dependant issues, you'll likely never resolve it, just need to upgrade the system.
If the crash is due to multi-threading, you could be looking in months of debugging (yes, literally).

Other than that, find the bug, reproduce it, find the location in the source, fix it, test it.


Crashes across platforms can come from various other sources as well.
Different memory model/OS means that a buffer overrun will be harmless on all platforms except one.
Memory alignment can cause weird things.
Slight bugs in interpretation of undefined behaviour.
Race conditions and threading libraries are generally completely different across platforms.
Timing or time-dependant code will behave differently.
Bugs in standard libraries (just look at boost and how many workarounds it has for just about ever major and minor version of compiler and OS).

Share this post


Link to post
Share on other sites
As a wild stab in the dark is it an endian issue?
what is the code doing when the error occurs?
Sun solaris on sparc is big whilst the other you mention are normally little.

Share this post


Link to post
Share on other sites
Quote:
Original post by CuppoJavaThe problem is the package works perfectly on Windows, Linux, and Unix machines.
The only machines that it doesn't work on are Sun Solaris Workstations.

Are there some common problems that I should begin by looking at?


I'm going to go out on a limb here and speculate that your code is working fine on i386-based architecture, but barfing on SPARC. First thing that comes to mind in that situation is alignment problems. The i386 will work with misaligned data, the SPARC will buserror.

The most likely culprit in this case is a cast between pointer types. If you were working with proper C++ code, you could just grep for reinterpret_cast<>() and you would find the problem. Unfortunatley, if you're working with code from C programmers your job will be a little bit tougher. Common places where this happens is with unmarshalling code and places where hand-rolled allocators are used.

Another possible place to look for alignment trouble is if there are any __attribute__((__packed__)) directives (or their equivant -- #pragema pack, etc).

Another likely place for such cross-platform problems is with endian issues. Again, the source of the problem is most likely in unmarshalling code.

Finally, a third place to looko for problems is with word size issues, especially as they apply to variadic functions (that is, functions using the ellipsis parameter). Those are easier to grep for.

The problem is unlikely to be in the Perl, FORTRAN, or scripts, if that helps any.

Share this post


Link to post
Share on other sites
Thanks for the very informative replies.
This gives me a handle on where to start looking.
-Cuppo

...from the looks of it, this bug is gonna take a while to squash.

Share this post


Link to post
Share on other sites

This topic is 3845 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this