Can't Store 4096 float4's in Constant Buffer

Started by
8 comments, last by 51mon 16 years, 5 months ago
I was trying to pack some data into a constant buffer. The documentation says that you could store as much as 4096 float4’s. But when testing in code I can only store a maximum of 3000. The application crash without message and the same behavior appears no matter which buffer gets extra components when the total numbers of constants reach a certain level. It appears to me that all the constant buffers have to share the same 4096x128bit space. Has anyone else experienced this? Can I make more room for a certain buffer by temporarily disable the other ones somehow? Thanks
Advertisement
This may be a driver problem. D3D10 drivers crash easy when you try to drive them to the limits. Have you checked against the ref device?
Quote:Original post by Demirug
This may be a driver problem. D3D10 drivers crash easy when you try to drive them to the limits. Have you checked against the ref device?


Yeah I thought of that to. According the several sources you could use 16 sets of 4096x128bits buffers at the same time. The ref device gave the same no-message error. Since the error only occur when the total number of constants reach a certain level and the application run fine otherwise I don’t think it has to do with implementation mistakes.
The only implementation difference I've read about (an earlier thread on this forum IIRC) is that the available CB's dropped from 16 to 15 because the IHV's wanted a "system CB". Or something like that.

I've not read of or seen any suggestion that would explain your observations.

Have you done any further analysis of your error condition?
  • Is it exactly 3000 Vec4's? Which line is it crashing out on?
  • Is it your bad code thats not catching the error condition, or is it a hard GPF/BSOD?
  • Are you checking ALL HRESULT's?
  • D3D10 debug output is opt-in, have you definitely done this? Sure you're not missing any debug messages?
  • Have you tried stepping through the code to manually validate all intermediaries?
  • What code are you using to create the CB? Have you verified this against any SDK samples?
  • Have you tried modifying the code in any of the SDK samples to reproduce this behaviour? A short and concise repro will be essential if you want AMD/Nvidia/MS to look into it.


Answers to the above should hopefully help us get to the bottom of this [smile]

Cheers,
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Quote:
  • Is it exactly 3000 Vec4's? Which line is it crashing out on?

It’s not exactly 3000 Vec4’s, it’s more like 3200. It doesn’t matter in which CB the variables are put, it appears that they all share the same memory. I do use other variables than float4 but the total number of variables is certainly less than 4096. The fx-compiler crash out on a line that is unrelated to the extra CB (all I do is add the new CB in the top of the source code and nothing more). On the line the compiler usually warns about a for-loop that needs to be unrolled. If I do unroll it manually the code compiles without errors and later crash when I try to get a reference to a technique.

Quote:
  • Is it your bad code thats not catching the error condition, or is it a hard GPF/BSOD?

  • I don’t fully understand what you mean with GPF/BSOD :) but I was fully satisfied with the code before the CB was added, everything was running stable. I read an article about constant instancing in DX10 and thought that I would like to give it a try.

    Quote:
  • Are you checking ALL HRESULT's?

  • I am checking all the HRESULT's.

    Quote:
  • D3D10 debug output is opt-in, have you definitely done this? Sure you're not missing any debug messages?

  • I output the debug messages with these lines:
    ID3D10Blob *pErrors = NULL;hr = D3DX10CreateEffectFromFile( g_Str, NULL, NULL, "fx_4_0", dwShaderFlags, 0, pd3dDevice, NULL, NULL, &g_pEffect10, &pErrors, NULL);if( pErrors){   DXUTOutputDebugStringA( (LPCSTR)pErrors->GetBufferPointer());   V_RETURN( hr);}SAFE_RELEASE( pErrors); 

    I think it has worked fine as far as I know?

    Quote:
  • Have you tried stepping through the code to manually validate all intermediaries?

  • PIX don’t work on my code. It’s a rather large pipeline.

    Quote:
  • What code are you using to create the CB? Have you verified this against any SDK samples?

  • To begin with I use a very simple code:
    cbuffer cInstanceData{   float4 t[3200];}


    Quote:
  • Have you tried modifying the code in any of the SDK samples to reproduce this behaviour? A short and concise repro will be essential if you want AMD/Nvidia/MS to look into it.

  • I did a very simple test by modifying the “SimpleSample” from the SDK and I was able to create several 4096 CB’s. As soon as I add to many variables to my engine it crash even though they are far from the limit.


    (If you want to check out the engine before the error here's a link , I think it's stable never notice any oddeties at run)
    Just a quick reply as I'm off to bed soon [smile]

    Quote:I do use other variables than float4 but the total number of variables is certainly less than 4096.
    Quote:modifying the “SimpleSample” from the SDK and I was able to create several 4096 CB’s
    Have you considered alignment issues? The compiler should be pretty good at this, but my understanding is that a CB has 4096 float4 slots, but you don't necessarily get 100% utilization (e.g. 16,384 float's) - for example I don't think it can pack a float3 across a boundary. So two adjacent float3's will use up two of the 4096 slots with only 75% utilization.

    Running your effect file via fxc.exe and outputting to HTML should give you a break down of how it packed your CB's, which may give you more of a clue...

    Debug messages :- fire up the DX control panel from the start menu, navigate to the D3D10 tab and hit the "edit list" button. You need to add the path to your executable here - your \debug\??.exe for example. The code you posted only returns the errors from the compiler, not debug info.


    hth
    Jack

    <hr align="left" width="25%" />
    Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

    I actually was able to compile the code; by enable release build. The code still doesn’t work. I’m copying a SO buffer into the CB but contents are totally faulty. I don't know if the error is because of mine implementation mistakes or other reasons. At the moment I have a very simplified setup.

    ID3D10Buffer* g_pInstanceData = NULL;V( g_pEffect10->GetConstantBufferByName( "cInstanceData")->AsConstantBuffer()->GetConstantBuffer( &g_pInstanceData));// This waypd3dDevice->CopyResource( g_pInstanceData, g_pIndirectLightData);// Or this wayD3D10_BOX updateBox;ZeroMemory( &updateBox, sizeof(D3D10_BOX));updateBox.left = 0;updateBox.right = updateBox.left + iInstanceCount * iStride;updateBox.top = 0;updateBox.bottom = 1;updateBox.front = 0;updateBox.back = 1;pd3dDevice->CopySubresourceRegion( g_pInstanceData, 0, 0, 0, 0, g_pIndirectLightData, 0, &updateBox);


    Quote:Have you considered alignment issues? The compiler should be pretty good at this, but my understanding is that a CB has 4096 float4 slots, but you don't necessarily get 100% utilization (e.g. 16,384 float's) - for example I don't think it can pack a float3 across a boundary. So two adjacent float3's will use up two of the 4096 slots with only 75% utilization.

    That’s probably true, but the total number of variables is definitely under 4096, it's more like 3200.

    Quote:Running your effect file via fxc.exe and outputting to HTML should give you a break down of how it packed your CB's, which may give you more of a clue...

    Debug build didn't compile and therefore didn't put out anything and release just gave predicted results.

    Quote:Debug messages :- fire up the DX control panel from the start menu, navigate to the D3D10 tab and hit the "edit list" button. You need to add the path to your executable here - your \debug\??.exe for example. The code you posted only returns the errors from the compiler, not debug info.

    I played around with the settings a bit. I don't understand where the debug info is outputed. How do I obtain it?


    Thanks for the help so far by the way :)
    I’m just going to write a few lines about how I solved this problem in case anyone else got the same. In order to compile the fx I divided it into two files and declared the variables they had in common with shared keyword. When copying data from VB to CB only float4’s (or multiple of) was allowed and the operation had to be carried out by ID3D10Device::CopySubresourceRegion.
    great idea of using 2 fx's to enable compiling!
    pls check my prior post about constbuffer, i could do with c_buf[4096] on April07 sdk, but not in Aug07 sdk now. I have to reduce to c_buf[3072] or c_buf[2048].
    the crazy shader in my prior post costs me 20sec to compile.@@

    Quote:Original post by 51mon
    I’m just going to write a few lines about how I solved this problem in case anyone else got the same. In order to compile the fx I divided it into two files and declared the variables they had in common with shared keyword. When copying data from VB to CB only float4’s (or multiple of) was allowed and the operation had to be carried out by ID3D10Device::CopySubresourceRegion.


    Quote:Original post by yk_cadcg
    great idea of using 2 fx's to enable compiling!
    pls check my prior post about constbuffer, i could do with c_buf[4096] on April07 sdk, but not in Aug07 sdk now. I have to reduce to c_buf[3072] or c_buf[2048].
    the crazy shader in my prior post costs me 20sec to compile.@@


    Sure I can give it a try but I didn't found the post you referring to, can you give me a link?

    This topic is closed to new replies.

    Advertisement