Sign in to follow this  

HELP!! really screwed up showstopping bug

This topic is 3863 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So while working away, I hit compile and then debug, and the app crashed at the outset. This is the same app that has been working fine for weeks. And the debugger is telling me that it crashes during a call to glDrawArrays - it sas it's trying to access 0x000000 address. The app has used this particular piece of code a million times before without incident, so I'm certain the problem lies elsewhere. Assuming I had done something screwy, I commented out the few lines of code I had added since the last successful test run, and tried again. It crashed in the same place. I did a clean rebuild. It still crashes in the same place. I manually removed all intermediate/object files and built again - same thing. Stepping through the program, it passes through the problem area once just fine, then the next time it calls it, it fails. Following the program execution I'm certain that nothing has happened to alter any of the openGL client states or anything else. In fact, the small bits of code that run between the first call to glDrawArrays and the second failing call are completely unrelated to the crashing code. This makes no sense whatsoever. What's more annoying is that it was working fine, and then I added a few lines, and it started crashing. So I removed those lines - so in theory it should be exactly back to where it was, but it isn't. This is a serious problem - I have absolutely no idea what could be the cause of this. I can only assume some kind of memory corruption is happening somewhere, but why is it crashing the app now and not before? Why when I commented out the lines and returned the program to it's last know good state, did it not make the problem go away?

Share this post


Link to post
Share on other sites
0x000000 address usually means you have a null pointer you are trying to use somewhere in your code. maybe post the actual class the error falls in or as much of your code as possible that will help people help you with your problem.

hope that helps.

Share this post


Link to post
Share on other sites
The code that crashes is this:


if (tex!=currentTexture)
{
glBindTexture(GL_TEXTURE_2D, texture[tex]);
currentTexture=tex;
}
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);


but like I said, I know that the problem lies elsewhere because this piece of code has been running perfectly for weeks. It's the drawArrays line that crashes, so presumbly the vertex or uv coord pointers are messed up, so I changed it to this as a test:


if (tex!=currentTexture)
{
glBindTexture(GL_TEXTURE_2D, texture[tex]);
currentTexture=tex;
}
glVertexPointer(2, GL_FLOAT, 0, &vertexArray);
glTexCoordPointer(2, GL_FLOAT, 0, &texArrayQuad);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);


and running that causes a crash on the line:

glTexCoordPointer(2, GL_FLOAT, 0, &texArrayQuad);

again, saying it's trying to access 0x00000. However the variable texArrayQuad is showing up as fine in the debugger - it's non-NULL and has all it's values as expected. The function that these pieces of code reside in are global functions and so there's no way a NULL class pointer could be at fault.

Like I say, I'm certain that code is not the problem. Something must be causing openGL (or maybe something related to ogl) to reset a pointer, but as I said in my first post, I undid the changes I made between the app running fine and the app crashing, so I don't understand why it's still doing this. Although those few lines were completely unrelated to openGL, so maybe it's not something in ogl being reset. I am totally lost on this one.

Perhaps OGL is losing context? I'm not really sure what could cause that, especially right at the start of the program, but I guess it's something to look into at least.

Share this post


Link to post
Share on other sites
I'm not an Open GL expert, but I'll throw some general advice out there that might give a few things to try at least.

Are you using source control? If so, and I was in your situation, I'd try getting an older revision that you know works, confirm it works, and then, check the diffs for betweent he builds for anything obvious.

If an older build doesn't work, maybe you updated a supporting library and it's interface changed subtly?

Share this post


Link to post
Share on other sites
glGetError is returning no error on every gl call between the working run through that code and the failing run.

Unfortunately I'm an idiot and don't use source control. The last backup I have is about 5 days old, so there'll be tons of changes, so it could take another 5 days to go through them all and there'd still be no guarantee that I find the problem. As I said, I made only very minor changes between the app working fine and then crashing regularly, and those changes were this:

In a completely unrelated class, I added the lines:


if (curUnit)
{
for (vector<Unit*>::iterator u=mapp->units.begin(); u<mapp->units.end(); u++)
{
if ((*u)->UnitIsAlliedWith(ET_Player))
{
FOWtiles.push_back(curUnit->pos);
}
}
}


I'm 99.9% positive that has nothing to do with the problem though - that code doesn't even get reached before the crash happens. And I tried commenting it out and still the crash happens.

And I haven't made any driver updates or installed any new libraries, or modified existing libraries.

This bug is just insane. It just doesn't make sense that it can make it through the first pass, but not the second. Between those two passes there is absolutely nothing that could affect the variables in use. And it's been working perfectly for weeks, so I would surely have to do something pretty stupid to get it to mess up this badly. Not that I'm not capable of such stupidity :)

Share this post


Link to post
Share on other sites
I think you're going to have to track it down the tedious way by slowly commenting out "unrelated" bits of the program and boiling it down to the smallest possible case. If it really is as subtle as it sounds we're not going to be able to guess what it is from tiny code snippets.

Share this post


Link to post
Share on other sites
Quote:
Original post by Damocles
Unfortunately I'm an idiot and don't use source control.

You're planning on learning from this mistake, right? I recommend SVN.

I'd recommend doing a diff between your backup and your current version, then making the listed modifications 1:1 -- should go much faster than the original coding -- until you track down the change that brings things crashing down.

Share this post


Link to post
Share on other sites
Is there any software out there (preferrably free) that can parse two sets of C++ files and list the differences in terms of different functions, times where function contents don't match, variable names that don't exist on both sets, etc?

Share this post


Link to post
Share on other sites
Thanks for the recommendation to Beyond Compare - it may not be a C++ parser, but it seems pretty good so far. It should at least speed up the hunt considerably.

Quote:
Have you even tried running this in a debugger? Set a any debug watches to find out when the variable goes to NULL?


That's the problem - I have no clue what variable is being set to NULL. I think it's something in openGL's memory space being corrupted by the code somewhere, so when I make a call to certain GL functions, it's pointers are messed up and it crashes.

Share this post


Link to post
Share on other sites
Have you laid some asserts in there to test vs. NULL? I will almost guarantee you that your texture pointer is NULL, or the "tex" variable is set to something insane.

Step through it in a debugger. What happens when you break the program during the crash?

Share this post


Link to post
Share on other sites
Stepping through it shows me nothing - all the variables local to the program are showing up exactly as expected. I'm assuming because it crashes on an openGL call that something has gone awry with one of the variables GL uses. I just don't know what could be affecting it.

Share this post


Link to post
Share on other sites
Do you have the stack trace of the crash? The debugger should show you exactly which assembly instruction tried to access 0x00000000. Do you have symbol information for OpenGL?

Share this post


Link to post
Share on other sites

The stack output for the moment of the crash is:


atioglxx.dll!690e142c()
[Frames below may be incorrect and/or missing, no symbols loaded for atioglxx.dll]
atioglxx.dll!691e36e5()
atioglxx.dll!691ea5f6()
atioglxx.dll!691ea27e()
>engine.exe!renderFloorTile(const float & xpos=1104.0000, const float & ypos=-168.00000, const unsigned int & tex=666, const int & rotation=0) Line 735 + 0xe bytes C++


the engine.exe!renderFloorTile is the function the glDrawArrays call sits in, and like I said, all variables involved in the operation are showing normal values.

I've never had to use symbol info for openGL before - can I just get the files I need from opengl.org or do I need specific files for a the card/drivers I'm using?

Share this post


Link to post
Share on other sites
Stupid question, have you done a rebuild all/reboot? Sometimes cobwebs can crop up. I would plop asserts around each and every parameter just to make sure they're in the range you expect, while you're at it.

Lastly, is there anything in your Output window?

Share this post


Link to post
Share on other sites
Yep, done a reboot/rebuild all and the output window is not showing anything unexpected. So yeah, looks like I'll have to put asserts around everything that might possibly be affecting it, which is gonna take me forever. I do so love random, seemingly pointless bugs.

Share this post


Link to post
Share on other sites
You know, it's funny. You always read about devs complaining about how multi-core programming is a real bitch and requires learning of loads of new habits/techniques, but I never really realised just what a bitch it can be, until now.

Yes, I have found the problem, and yes it related to hardware threading. I used glIntercept to track every single gl call made from startup to crash, and I noticed a weird handful of commands that shouldn't have been there. I now realize the program was trying to enter 3D mode before it had finished with the 2D rendering, and during the middle of setting 3D gl states, the program tried to draw in 2D. I had forgotten to set the 3D enabling function to omp critical, and through some weird twist of fate, it had been working fine for ages and now suddenly the timing must have changed by a fraction and one thread was running ahead of schedule.

Thanks for your help guys. It was a stressful couple of days, and I'm definitely a lot balder than I used to be, but I should be grateful that I found this show-stopping bug now instead of after release.

Share this post


Link to post
Share on other sites
Ive had those types of bugs before, and it did usually boil down to a threading issue. Good to see you've resolved it.

Share this post


Link to post
Share on other sites

This topic is 3863 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this