Tracking invalid memory accesses

Started by
25 comments, last by eq 18 years, 4 months ago
Hey everybody, Can anyone suggest a good application that can track accesses to invalid memory locations (eg. by writing through uninitialized pointers, etc), even in mapped memory ? We have an extremly obscure bug, where something in our code corrupts data directly on the video cards memory. The bug is very hard to reproduce, and we're basically stuck by the fact that we can't watch a memory location on mapped VRAM from within a debugger. The bug manifested itself once in system RAM as well, but of course the application wasn't running on a debugger at this point, and we were never able to reproduce it. Boundschecker doesn't find anything. gDEBugger doesn't find anything. Anyone could suggest some alternatives ? Language is C/C++ (no .NET), IDE is MSVS 2003 (or alternatively 2005, if required). Thanks !
Advertisement
Hi,

there is a WinAPI function, that is able to determine whether a pointer is valid or not.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/base/isbadreadptr.asp

Maybe that helps.

georg
I don't think that'll be of much help, as Yann's problem is access to momry that's, say, outside the bounds of an array, but still mapped to the program. This is tricky, as in, raises no exception.

Yann, forgive me if I isunderstood you, but if it's heap corruption occuring in VRAM, then... Good luck. you're gonna need that by the truckloads.

If it's in the system RAM, on the other hand, there're a few ways to track that. I assume this is via some sort of Array initialized by new[]. If so, you can do the following trick (PAINFUL trick, If I may add, but it's a last resort):

make a template class, say watched_array<class T> (yeah, there are millions of better names. back when I did that I had a breakdown from the bug I was having), have a static map of void*s and unsigned ints (unsigned int is the bounds of that pointer). overload the type's new[] to record the number of items and the beginning pointer into that map. watched_array's operator[] checks for the passed index being within bounds (by checking the global map[this]). This is crude, and classified as second degree evil hack :D.

Again, forgive me for any mistakes, and if you'd like me to *gulp* dig the old code up, I'd be more than glad to.
There is a commercially available software from Rational called Purify. An open source version called 'checker' is available from gnu. Basic idea is to write your own heap manangement routines (new/malloc/free) and on each access check if it is within an allocated block. I think Purify and Checker work the same way.

I think the Visual Studio 2005 heavy duty editions have such tools for static (source code) and runtime analysis. Not sure what they are called though.

I dont think there is any tool that detects that your own memory space has been written to, that is a perfectly valid operation.

Perhaps Yann, you'll have to convert your code so that you can log each write operation by a pointer, perhaps overloading a pointer class?

Then you can map memmory, and if any pointer writes to an area you mapped as do-not-write, you can trigger an event...

I have some SofIce debugging experience from my driver writing days, but I don't remember even that powerful tool having the ability of arbitrarily detecting mem writes.

The last thing you can try to do is a runtime opcode-replacement run, where you replace each mem-writing opcode with a call to a specific function who's task it is to monitor such things, but this will probably kill any performance you have.

Hope it helped...
I thought Purify was more like Valgrind, which is great but sadly Linux only. It might be possible to run under Valgrind+Wine but I'd be surprised if you'd get much useful output (you'll only get a memory address of the instruction rather than a file/line number for a start). It does track use of uninitialized pointers but if both of those are out of the question then there may be one other possibility: it's possible on Windows to use VirtualProtect to set a guard status on a page. That may let you track every single memory access into the range you're trying to nail (I'm assuming you can find that range), so you could track the source of the problem that way. Then you want to figure out whether it's coming from inside your program of the graphics drivers which will need some range checking on the accessing instruction or something, I'm not sure how you'd do that.
Thanks for the suggestions so far. I will be a little more specific about what happens.

The bug happens (sometimes) after a certain combination of user operations in the application. The result is always the same type of corruption, if you stay on the same test machine. If we modify (unrelated) parts of the application code, the corruption changes slightly, but stays within the same memory region.

So far so good. Since I know when it happens (approximately), and where the corruption occurs (exactly), I could just set a write-on-memory watch, that will break execution when the address is written to. A short stack backtrace, and I'd have the bug.

But there is a big problem: the corruption happens in the VRAM that is currently mapped to my process. I have no direct monitoring access to this memory through the debugger, I don't even know where in virtual address space it is being mapped to (since that is the business of the driver).

An example of the bug in action: After performing the critical operation chain, a VBO is suddendly partially corrupted (ie. the vertices fly away in arbitrary directions). After modifying the code, the corruption suddendly kills off parts of a texture instead. If the code is not changed, the bug will always corrupt the same video resource (which is a little strange by itself, since those resources are dynamically handled by the driver, ie. will theroretically end up in different location in VRAM everytime the program is started). It's not a driver issue, since it happens on some ATI and nVidia setups.

So in essence, we're stuck. That is the strangest bug that I ever encountered. We already use a custom memory manager that will track simple bound overwriting, but not arbitrary accesses to parts well outside the allocated region, yet still mapped to the process. We can't overload new, * and [] for every possible type either, that would be a complete nightmare (we're talking about around 600k LOCs here !)

The best option would be a compiler plugin, that would add a check infront of every memory dereference in the compiled ASM, and test the pointer address against the list of allocated blocks. Of course that would compelely kill performance, but this is not an issue as long as we find the reason for the bug.

I'll check out Purify. Also, VirtualProtect sounds interesting. Could it be used to monitor VRAM mapped into the processes address space as well, assuming I find a way to determine where the exact address range is located ?
Maybe you could set a breakpoint condition on registers having that exact memory location? Might not work, but worth a shot...

Also, I am confused, you say that you know the exact memory location but don't know the virtual address?

Maybe you can try Process Explorer from www.sysinternals.com which can give you a map of the memory of the process (I think). Also, Windows probably has memory profilers which you can use for looking at the memory usage and virtual memory map.
Purify looks pretty good. I will just buy it, and see if it can find something. I'm really desperate, any more suggestions for other applications are more than welcome !

Thanks !

Quote:
Also, I am confused, you say that you know the exact memory location but don't know the virtual address?

Sorry, I wasn't precise here. I know an exact relative offset where the corruption occurs within the VRAM itself (or at least, within the resource - but I can find out where the resource is located). But I don't know where the VRAM is being mapped to, so I don't know the exact memory address from the point of view of my applications address space.
have u tried paul nettles memory checker?
also try running the app without VBOs or DLs ie standard VAs or immediate
ive had some strange errors like this -> "a VBO is suddendly partially corrupted (ie. the vertices fly away in arbitrary directions)"
though if youre seeing it with nvidia + ati is seems unlikely is a driver bug

This topic is closed to new replies.

Advertisement