Memory Stomp

Started by
18 comments, last by _the_phantom_ 13 years, 2 months ago
Will your compiler do some optimization to change the form of your loops?

I guess that to some level, compiler will try to re-write the code before turn them into machine code in order to get maximum efficiency.

Anyway, it is an unverified guess.

Advertisement
Here's a couple of things I would try, investigate whether something is being naughty with the heap, particularly around the block pointed by [font="Courier New"]source [/font]and [font="Courier New"]destination[/font]. Use debugger to monitor that part of the ram. Also change the allocation code to something like below and see what happens:


int size = 256;
char* source = malloc( size * 2);
char* destination = malloc( size* 2 );
Latest project: Sideways Racing on the iPad
Thanks again everyone for the ideas.

@Antheus - it's on various models of [$#%#]'s, using [$#%#$]'s compiler/OS.
[edit] names removed to protect the guilty party [/edit]

@SiCrane, Nofootbird - the compiler may be doing something strange with that loop (I haven't checked). But the loop itself isn't the problem. We were experiencing unexplainable graphical bugs and eventually I tracked it down to being caused by our index-buffer allocations being corrupted in VRAM (I used other tools to find this out). After pinpointing where the corruption was happening, I added those "validation" loops around the place, which (apparently) indicate the memory-overwrite can occur at a high frequency (i.e. fails multiple iterations of the 'attempt' loop).

@swiftcoder - ooh, that's unlikely, but possible.

@_Unicron_ - I wish it was something that crazy!

@smasherprog - yeah, if it is that, then something is going waaay out of bounds though, like [font="Courier New"]myArray[-453245] = 42;[/font]

@Tachikoma - I did actually find/fix some bugs in the heap allocator, but it didn't fix my problem :/
I've used the debugger to monitor the RAM, but it does not detect any changes being made to it (except when I deliberately set it to the correct value), which indicates that the corruption is being caused by another device / a DMA write (the debugger is unable to monitor DMA writes).
Are you talking about VRAM or GPU mapped system ram? What kind of address are we talking about relative to the start of that memory type? If it's the mapped area then I seem to recall you are not supposed to use the first 4k of that memory after the command buffer space due to a possible overrun in the command buffer (but you probably account for that already).

Something that indicates it's not DMA to me is that DMA's need to be 16 byte aligned, so I don't see how unaligned single bytes could be modified. To be sure you could wrap your SPU dma puts in a function that checks the range it is writing to to confirm it's not coming from there?

Another question - what is the erroneous value written into memory? Is it always the same?
Sure is a crazy issue you have to let us know if you get to the bottom of it!

[size=2]What's the first thing you suspect may be the problem?
What's the craziest thing you think the problem could be caused by?
[font="arial, verdana, tahoma, sans-serif"][/font]


a) memory stomp
B) pointer aliasing

is the pattern that is overwritten consistent?
is the pattern that is overwritten a consistent address, or a consistent offset from somewhere else.

ill put 2.5$ on memory stomp *after* the memcpy. (or before, and then cleaned up by the memcpy when the assert doesnt fire)


your never as good as they say you were, never as bad as they say you was.
Bit late to the party but we had a problem where the particle system was causing the game to crash but only in game, not in our test bed.

In a moment of crazy I suggested that what was happening was that AI were accessing a 'dead' pointer, the value of which happened to point into the particle system, where it would then write data causing the particle system to explode and appear to be the problem.

So, pretty much what MattD said really.

(For the record, it turned out my crazy idea was pretty much spot on, don't recall if it was AI to blame but it was a game play side thing causing the problem)
Sure is a crazy issue you have to let us know if you get to the bottom of it!
Thanks for all the inspirations, people.

Turns out there were actually two memory stomps:
  • One was actually caused by faulty RAM chips affecting all dev kits in a particular manufacturing run...
  • The other bug was a memory allocator that was returning less than the requested number of bytes. When this occurred with a render-target, then the GPU's ROP writes would stomp other allocations in VRAM...
I had assumed there was only one bug, which made hunting that much harder -- I actually did disable our buggy module at one stage, but the symptoms of the stomp still remained due to the faulty RAM, so I had dismissed this module as a suspect...
I don't envy you, trying to deal with such bugs. Glad you figured it out though.

[quote name='Noggs' timestamp='1296625495' post='4768342']Sure is a crazy issue you have to let us know if you get to the bottom of it!
Thanks for all the inspirations, people.

Turns out there were actually two memory stomps:
  • One was actually caused by faulty RAM chips affecting all dev kits in a particular manufacturing run...
  • The other bug was a memory allocator that was returning less than the requested number of bytes. When this occurred with a render-target, then the GPU's ROP writes would stomp other allocations in VRAM...
I had assumed there was only one bug, which made hunting that much harder -- I actually did disable our buggy module at one stage, but the symptoms of the stomp still remained due to the faulty RAM, so I had dismissed this module as a suspect...
[/quote]

Out of all the insane bugs I've figured out in working on platforms like 360, Wii, PS2 and DS, thankfully I've never had to deal with faulty hardware.

Much respect is awarded to you and your people for working that particular piece of insanity out.
[size="2"][size=2]Mort, Duke of Sto Helit: NON TIMETIS MESSOR -- Don't Fear The Reaper
wow, that bad memory is just unlucky when it came to complicating the issue :(

Nice job on finding it however :D

This topic is closed to new replies.

Advertisement