How is it possible for the output register in a pixel shader to differ from PS output?

Started by
15 comments, last by relaxok 12 years, 1 month ago
[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

One other tidbit of information.. It 'improves' the performance and reduces artifacts if I directly set the meshNum in the vertexshader to 251 as well?! So it's getting corrupted in the vertex shader as well. If I debug any of the vertices in a PIX frame capture, the meshNum is correctly 251 in all of them even without doing that override. It's like PIX doesn't know what's going on.

[/font]

[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

The interesting thing is is that with this override, the corruption is much more infrequent - in fact, when i tried to fraps it and the framerate went down to 60 fps, I couldn't get it to happen - but it happens every 0.25 second or so at 1000 fps.

[/font]

[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

Granted, this doesn't really matter for fixing the problem, because in the end I can't force a value, it needs to be the one it gets from the vertex format.

[/font]

[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

I wish I could take more video - but anyway the video I posted is pretty similar to what is happening

[/font]



This sort of thing makes you want to give up programming altogether.

Advertisement

This sort of thing makes you want to give up programming altogether.


This sort of thing is usually question of some a little bug in the program. Never give up. Narrow down your problem and find a solution.
Can you post the hlsl shader and the code where you set your textures, constants etc?
Have you made sure that there isn't any NaNs/INFs passed to your shader?

Cheers!

[quote name='relaxok' timestamp='1330425453' post='4917358']
This sort of thing makes you want to give up programming altogether.


This sort of thing is usually question of some a little bug in the program. Never give up. Narrow down your problem and find a solution.
Can you post the hlsl shader and the code where you set your textures, constants etc?
Have you made sure that there isn't any NaNs/INFs passed to your shader?

Cheers!
[/quote]

Thanks for reminding me not to give up :)

I will post a bunch more relevant info later when I'm at home again, however here's the problem in a nutshell that I can't seem to get my mind around...

If i look in the PostVS vertices in PIX, there is a MESHNUM semantic which is one of the vertex elements, and every single one is 251 - correctly. But the pixel output is somewhat corrupted later as described.

However, I can get rid of much of the artifacts, by explicitly setting input.meshNum to 251 in the vertex shader.

Disregarding anything else, this is broken down to a very specific problem.. how is it possible for that to happen if the debugger is correct?

The only thing I can think of is that the slowwww framerate when you capture a frame in PIX keeps it from happening.
Two things:

  1. The PIX debugger is not 100% reliable. It doesn't debug on the hardware, nor does it use the actual microcode executed by the hardware. Consequently it's very possible to get different results in the debugger vs. what actually gets written to memory. Nvidia's Parallel Nsight is capable of hardware debugging, but you need either a second GPU or another PC for remote debugging. It's also for Nvidia only. AMD has a debugging tool which may give you better results than PIX, but it does not use hardware debugging.
  2. Driver bugs come up all of the time, in all kinds of weird ways. Sometimes you get weird results from an optimization, or sometimes your app might just crash from some bad combination of shader instructions. The best way to narrow down a driver bug is try using the REF device. If you get correct output from the REF device, then it's very likely that you hit a driver bug. If you have a hit a bug it can be hard to work around...often your best bit is to either start disabling things until you find the problem spot or you can try writing your code in a different way (for instance compiling in a constant value instead of pulling it from a constant buffer) to see if you start getting the correct behavior.

Two things:

  1. The PIX debugger is not 100% reliable. It doesn't debug on the hardware, nor does it use the actual microcode executed by the hardware. Consequently it's very possible to get different results in the debugger vs. what actually gets written to memory. Nvidia's Parallel Nsight is capable of hardware debugging, but you need either a second GPU or another PC for remote debugging. It's also for Nvidia only. AMD has a debugging tool which may give you better results than PIX, but it does not use hardware debugging.
  2. Driver bugs come up all of the time, in all kinds of weird ways. Sometimes you get weird results from an optimization, or sometimes your app might just crash from some bad combination of shader instructions. The best way to narrow down a driver bug is try using the REF device. If you get correct output from the REF device, then it's very likely that you hit a driver bug. If you have a hit a bug it can be hard to work around...often your best bit is to either start disabling things until you find the problem spot or you can try writing your code in a different way (for instance compiling in a constant value instead of pulling it from a constant buffer) to see if you start getting the correct behavior.



Thanks, that makes me feel so much better. I wish i had a 2nd card to use Nsight shader debugging with.

I'll check REF when I get a chance later. I *am* using beta drivers, since they are required to use nsight at all, supposedly. so maybe i'll try downgrading to the latest stable driver if REF seems fine.
Also this might be unrelated, but are you actually using alpha to coverage? If not, you could try disabling it to see if it helps.

Also this might be unrelated, but are you actually using alpha to coverage? If not, you could try disabling it to see if it helps.


OK -- Just tried the REFERENCE driver and it's still happening (albeit at 0.5 fps!) - so a driver bug is not my savior =(

I am using alpha to coverage yeah, however I have tried disabling it and it had no effect.

It's really weird to me that even with the reference driver it's happening, and that PIX debugging is actually incorrect as far as the values it's showing.

I really can't imagine what the issue is, other than some kind of memory corruption/scribbling (but by whom?!) or something related to the multithreading? I don't think the deferred context is doing anything though..

OK, i've VERY much simplified the shader code to show the precise problem now..

here is the current very trimmed down HLSL:

http://codepad.org/1LgyyBhh

here is PIX showing that the MESHNUM variable in the vertex structure is 251 (it's 251 all the way down but this is just a still shot of the top few:)

Pre-VS:
pixelshader_wtf10.png

Post-VS:
pixelshader_wtf10b.png


As you can see for this test I default the pixel color to 'see through' basically (0,0f/0.0f/0.0f/0.0f), so you can see the bad pixels show through to the background.. here is a video:

http://bh.polpo.org/simplify_badpixels.wmv

If you simply comment out lines 108-110 of the shader there, and uncomment line 106, the result SHOULD be identical. A video of this method shows it's perfect:

http://bh.polpo.org/simplify_goodpixels.wmv

So, to my eyes the only possible things that can be happening are:

-- the bytes of memory that contain the last float in the vertex structure are getting corrupted in memory. And it can't be something specific to the videocard because it happens in the REFERENCE device. DirectX is doing it somehow. And it's not affecting the rest of the vertex structure, otherwise there'd be issues with the vertices and texture coords, etc..

OR

-- PIX isn't really showing what the real vertex structure values are, and it's not really 251. If not, why not, and what could possibly be overwriting it.. and is there anything else I can do besides getting a setup that can use NSight and seeing what the hardware shows?
UPDATE:

Solved!! The person who mentioned the floating point comparisons was on the right track. I needed to be round()-ing the floats, not just casting to int. There were apparently lots of 250.999999 in there (Showing up as 251.000 in PIX) and int was rounding them down to 250 which is a different set of textures in TexIndices... and so on for other areas of the terrain.

Thanks to MaulingMonkey on gamedev irc..

And thanks to all for your suggestions, I learned a lot along the way..

This topic is closed to new replies.

Advertisement