Jump to content
  • Advertisement
Sign in to follow this  
seljo.myeri

Need help with HLSL/Direct Compute

This topic is 2757 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am working on a custom movie format player accelerated via DX10.1/11. I have a VS2010 solution that builds and runs but I am fairly noob to DX coding (only been playing with it for a few days) . Been doing C#/C++ for 10 yrs. I *think* my problem is in my shader code; the CS. The source image/movie data is in a Bayer format, which requires several interpolations per pixel to compute all RGB values because each "pixel" only contains one color sample; the other two are interpolated from nearest neighbor pixels. You can check Wiki or the source for more details there. I have a "software" version that uses only the CPU in C# that works; "just" needs a 4x core 3+GHz system to run full 24-30 fps.

The problem I have with the Dx version is that the result rendered is only a black screen. If I hard code the PS to solid colors, I get what I expect, but using the results of the CS leads to the black render area. Downloading the output buffer from the CS after it runs shows that all but the first pixel val is zero, and the 1st is practically zero (~1*e-38). This explains the render results, but I don't know why this is the result of the CS. I assume from all examples I've seen that the PS uses normalized color range of 0 to 1.0 floating point, and I am doing that normalization on the CPU before sending it to the GPU. The output doesn't change regardless of the input's normalization status. The source format is 16 bit unsigned int value per pixel.

My software version will suffice for my current needs, but I am working toward learning DX in general and so I can do more advanced things like overlay symbols/highlight specific areas of the images as they play, etc.

I started trying to learn Dx/Compute from this PDC lab. My DX code is based primarily on this.

Are there any good debugging tools I can use to check my CS code? Anyone want to take a look at my code and offer feedback?

I could not attach the files here, so I put them on my [url="http://cid-255987229e4d13e3.office.live.com/self.aspx/.Documents/DxViewerNET.zip"]sky drive[/url]. (updated link to latest code) Edited by seljo

Share this post


Link to post
Share on other sites
Advertisement

I assume from all examples I've seen that the PS uses normalized color range of 0 to 1.0 floating point, and I am doing that normalization on the CPU before sending it to the GPU. The output doesn't change regardless of the input's normalization status. The source format is 16 bit unsigned int value per pixel.


How the shader interprets a value from a texture or buffer depends on the DXGI_FORMAT used when creating the texture and the shader resource view. So for instance if you use a UNORM format, the actual raw texture memory will be an unsigned integer but when you sample it in the shader it will get converted to a 0.0 - 1.0 value. The full list of conversion rules are here: http://msdn.microsoft.com/en-us/library/dd607323(v=vs.85).aspx. So what you can take away from this is that you should never need to do any manual normalization on the CPU-side when setting the data for textures or buffers, you need to set them in their raw memory format and the GPU will convert them as needed in the shaders. The same applies for values output from a pixel shader, or from a compute shader into a UAV.


Are there any good debugging tools I can use to check my CS code? Anyone want to take a look at my code and offer feedback?


The DX SDK comes with PIX which can be used for debugging vertex/geometry/pixel shaders, but unfortunately it seems MS has basically stopped updating it and it still doesn't support hull, domain, or compute shaders. However you can debug compute shaders with the vendor-specific tools, which are GPU PerfStudio for AMD and Parallel Nsight for Nvidia (note that Parallel Nsight requires an extra GPU or remote debugging PC to debug shaders).


I could not attach the files here, so I put them on my sky drive.


I'll take a look later and see if anything looks out of whack.

Share this post


Link to post
Share on other sites
So what you can take away from this is that you should never need to do any manual normalization on the CPU-side when setting the data for textures or buffers, you need to set them in their raw memory format and the GPU will convert them as needed in the shaders.[/quote]

Good, I knew the GPU was the place to do this, but I'm a noob, so didn't know it was automatic. This conversion from ushort to float on the CPU pre-input buffer loading is most likely the culprit (in conjunction with the PS format specification), in this case. The wrong format spec would seem to explain why the output is the same regardless of my CPU (or lack of) normalization.

if you use a UNORM format, the actual raw texture memory will be an unsigned integer but when you sample it in the shader it will get converted to a 0.0 - 1.0 value.[/quote]

2 points:
1) My source data is in ushort data elements, however it is not normalized. Each frame has to be normalized based on the max and min values. Ex: the raw data for one frame may have one pixel with val 900 (max) and another with 234 (min). This is now the "range" for all pixels in that frame, and must be converted to the "color range format". Ex: for 8 bit color range 900 = 255 and 234 = 0... (See my CPU rendering C# code for 24 bit RGB conversion).
2) How do I represent a ushort (unsigned 16bit int) in a HLSL buffer? I see no integer type smaller than 32 bits (which is why I converted to float on the CPU in the first place...and then assumed the normalization was needed there too... a slippery slope).

If I understand the format conversion link you gave, each pixel still has to be normalized to a standard range (255 to 0, for example) even if I pack my 16 or 8 bit uints into a 32bit uint buffer?

Thanks for the feedback! It gives me a direction to investigate at least. Please let me know if there is anything you can find in my code or other tips!

Share this post


Link to post
Share on other sites

2 points:
1) My source data is in ushort data elements, however it is not normalized. Each frame has to be normalized based on the max and min values. Ex: the raw data for one frame may have one pixel with val 900 (max) and another with 234 (min). This is now the "range" for all pixels in that frame, and must be converted to the "color range format". Ex: for 8 bit color range 900 = 255 and 234 = 0... (See my CPU rendering C# code for 24 bit RGB conversion).
2) How do I represent a ushort (unsigned 16bit int) in a HLSL buffer? I see no integer type smaller than 32 bits (which is why I converted to float on the CPU in the first place...and then assumed the normalization was needed there too... a slippery slope).

If I understand the format conversion link you gave, each pixel still has to be normalized to a standard range (255 to 0, for example) even if I pack my 16 or 8 bit uints into a 32bit uint buffer?


There is DXGI_FORMAT_R16_UNORM and DXGI_FORMAT_R16_UINT available. If you use the former, the shader will read it as a 0.0-1.0 floating point value. If you use the latter, the shader will read it as a 32-bit UINT. I would say just use the UINT format for now, and that way you can just do the conversions manually like you would in your CPU code. Then later you can experiment with using the normalized formats to see if you can optimize the performance a bit.

Share this post


Link to post
Share on other sites
I would say just use the UINT format for now, and that way you can just do the conversions manually like you would in your CPU code.[/quote]

Ok, but how? Just put my 16 bit values into 32 bit buckets in the buffer and let the compiler pad appropriately? Currently, I am setting DXGI_FORMAT_UNKNOWN for all UAV's and SV's.

Also, it should be noted that I am taking the resulting output buffer from the CS and using it directly in the PS. I do not create a "texture" per se. That was to be later (so I could put thetexture on a rectangle and move the rectangle based on the window size). Right now the CS does all the computing and the render window is a fixed size.

Here is my PS:

// definition of a Bayer pixel color buffer element
struct BayerRGBPixelBufType
{
float r;
float g;
float b;
};

struct BayerPixelBufTypeF
{
float val;
};

// a read-only view of the point color buffer resource
StructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer;

float4 PS( float4 Pos : SV_POSITION ) : SV_Target
{
vector <float, 4> pixOut = vector <float, 4>(.0f, 0.f, 1.f, 1.f);//(1.f,1.f,1.f,1.f);

//get the current pixel index in the converted bitmap
uint ipix = Pos.x + ((Pos.y) * 1280);

//leave set const alpha for all pixels from init
//pixOut.a = 1.0f;

pixOut.r = BayerRGBFrameBuffer[ipix].r;
pixOut.g = BayerRGBFrameBuffer[ipix].g;
pixOut.b = BayerRGBFrameBuffer[ipix].b;

return pixOut;
}[/quote]

Share this post


Link to post
Share on other sites
Yeah I'm sorry, I (incorrectly) assumed that you were using textures instead of structured buffers. With structured buffers, there is absolutely no conversion performed by the GPU at all. It simply reads the raw buffer data and interprets it according to the types in your HLSL structure definition.

Share this post


Link to post
Share on other sites
So, any other ideas as to why the CS is outputting the same result, no matter the input? I am dispatching 1280x800 Thread groups with one thread per group because I ran into the 768 concurrent thread limit trying to dispatch a single group with a thread per pixel. Is there a limit on the total groups as well?

Share this post


Link to post
Share on other sites

So, any other ideas as to why the CS is outputting the same result, no matter the input? I am dispatching 1280x800 Thread groups with one thread per group because I ran into the 768 concurrent thread limit trying to dispatch a single group with a thread per pixel. Is there a limit on the total groups as well?


I'm pretty sure this is an inefficient way to dispatch the threads you need. Try launching 32 by 16 by 1 threads per group and dispatching 1280/32 by 800/16 by 1 thread groups.

As to why you are getting a black screen, well, looking at your PS code it seems to me that your BayerRGBFrameBuffer isn't filled with any input. I think you need to replace StructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; with StructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer : register (t0);
and fill that t0 slot with the desired information CPU side.

EDIT: fixed a mistake in my suggested code.

Share this post


Link to post
Share on other sites
I'm pretty sure this is an inefficient way to dispatch the threads you need. Try launching 32 by 16 by 1 threads per group and dispatching 1280/32 by 800/16 by 1 thread groups.[/quote]

I figured it wasn't optimal, but just trying to get *something* to render right now. I'll keep this in mind when I get to the optimization part (hopefully). :huh:


As to why you are getting a black screen, well, looking at your PS code it seems to me that your BayerRGBFrameBuffer isn't filled with any input. I think you need to replace StructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; with StructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer : register (t0); and fill that t0 slot with the desired information CPU side.
[/quote]

I thought the registers were set according to order of declaration by the compiler if not explicitly set, as you suggest. I copied the outline of the shader code from that PDC lab code I referenced above, and that works, but I'll check that out. It definitely seems that my CS output buffer may not be passed to the PS input buffer. I am downloading the CS output to the CPU to take a look and it is all zero. Something is wrong there before the PS is even a factor. The entire solution is available for download, I can help if there are issues getting it to build.

Here is the CS code:

//This is the Bayer compute shader file.

//must match the calling code...
#define rawPixW 1282
#define rawPixH 802
#define outPixW 1280
#define outPixH 800

// definition of a Bayer color pixel buffer element
struct BayerRGBPixelBufType
{
float r;
float g;
float b;
};

struct BayerPixelBufTypeF
{
float PixVal;
};

//changes per frame
cbuffer CB0
{
float frameMax;
float frameMin;
};

//Output RGB frame data...
RWStructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer;

// a read-only view of the raw bayer format pixel input buffer
StructuredBuffer<BayerPixelBufTypeF> sourcePixels;


//[numthreads(outPixW, outPixH, 1)]//this is 1 group per frame: threads limited to 768 per group...
[numthreads(1, 1, 1)]// execute one thread per pixel in group; groups = pixels per frame...
void CS( uint3 Gid : SV_GroupID,
uint3 DTid : SV_DispatchThreadID,
uint3 GTid : SV_GroupThreadID,
uint GI : SV_GroupIndex )
{
//vector <float, 4> pixOut = vector <float, 4>(1.f,1.f,1.f,1.f);

////get the current pixel index in the SOURCE buffer
//uint ipix = DTid.x + 1 + ((DTid.y+1) * rawPixW);
////pixel index in output buffer
//uint desti = DTid.x + DTid.y;
//bool evenRow = (uint((DTid.y+1) % 2) == 0);
//bool evenCol = (uint((DTid.x+1) % 2) == 0);

//get the current pixel index in the SOURCE buffer ( add 1 since the co-ords are for output; less the outer ring of pixels)
uint ipix = Gid.x + 1 + ((Gid.y+1) * rawPixW);
//pixel index in output buffer
uint desti = GI;//Gid.x + Gid.y;
bool evenRow = (uint((Gid.y+1) % 2) == 0);
bool evenCol = (uint((Gid.x+1) % 2) == 0);

//leave set const alpha for all pixels from init
//pixOut.a = 1.0f;

//**TODO: normalize...? assume normalized already by CPU for now...
uint left = ipix - 1;
uint right = ipix + 1;
uint above = ipix - rawPixW;
uint below = ipix + rawPixW;
uint topLeft = 0;
uint bottomLeft = 0;
uint topRight = 0;
uint bottomRight = 0;

//check what row we're on (even: GR)
if(evenRow)
{
//check which col we're on
if(evenCol)
{
//even col: green pixel

// GREEN IN CENTER
//
// X B X
// R G R
// X B X
//
BayerRGBFrameBuffer[desti].r = (sourcePixels

.PixVal
+ sourcePixels

.PixVal) * 0.5f;
////
BayerRGBFrameBuffer[desti].g = sourcePixels[ipix].PixVal;
////
BayerRGBFrameBuffer[desti].b = (sourcePixels[above].PixVal
+ sourcePixels[below].PixVal) * 0.5f;

}
else
{
//odd: red pixel
topLeft = above - 1;
bottomLeft = below - 1;
topRight = above + 1;
bottomRight = below + 1;

// RED IN CENTER
//
// B G B G
// G R G R
// B G B G
//
BayerRGBFrameBuffer[desti].r = sourcePixels[ipix].PixVal;
//
BayerRGBFrameBuffer[desti].g = (sourcePixels

.PixVal
+ sourcePixels

.PixVal
+ sourcePixels[above].PixVal
+ sourcePixels[below].PixVal) * 0.25f;
////
BayerRGBFrameBuffer[desti].b = (sourcePixels[topLeft].PixVal
+ sourcePixels[bottomLeft].PixVal
+ sourcePixels[topRight].PixVal
+ sourcePixels[bottomRight].PixVal) * 0.25f;
}
}
else //(odd row): GB
{
//check which col we're on
if(evenCol)
{
//even: G
// GREEN IN CENTER
//
// X R X
// B G B
// X R X
//
BayerRGBFrameBuffer[desti].r = (sourcePixels[above].PixVal
+ sourcePixels[below].PixVal) * 0.5f;
////
BayerRGBFrameBuffer[desti].g = sourcePixels[ipix].PixVal;
////
BayerRGBFrameBuffer[desti].b = (sourcePixels

.PixVal
+ sourcePixels

.PixVal) * 0.5f;
}
else
{
//odd: B
topLeft = above - 1;
bottomLeft = below - 1;
topRight = above + 1;
bottomRight = below + 1;
// BLUE IN CENTER
//
// R G R
// G B G
// R G R
//
BayerRGBFrameBuffer[desti].r = (sourcePixels[topLeft].PixVal
+ sourcePixels[bottomLeft].PixVal
+ sourcePixels[topRight].PixVal
+ sourcePixels[bottomRight].PixVal) * 0.25f;

BayerRGBFrameBuffer[desti].g = (sourcePixels

.PixVal
+ sourcePixels

.PixVal
+ sourcePixels[above].PixVal
+ sourcePixels[below].PixVal) * 0.25f;
//
BayerRGBFrameBuffer[desti].b = sourcePixels[ipix].PixVal;
}
}
}


Here is the C++ code between CS and PS calls:

void DxViewerNET::Native::BayerIDx::DrawBayerFrame( BayerPixelF* frameBuf,
FrameRange range,
UINT bufLen /*= 1282*842*/ )
{
HRESULT hr = S_OK;

//create the raw bayer image buffer (input)
CreateBufferResourceAndViews(sizeof( BayerPixelF ),
bufLen,
frameBuf,
&myBayerInBuffer,
&myBayerInBufferSRV,
NULL ) ;

//**TODO:update the range for normalization...--assume normalized already for now

//**TODO: call the Comp Shader
//set up
myContext->CSSetShader(DxInterface::myComputeShader, NULL, 0);

// give the CS read access to the input buffer
myContext->CSSetShaderResources( 0, 1, &myBayerInBufferSRV );

//give the CS write access to the BMP output buff
UINT UAVinitCounts = 0;
myContext->CSSetUnorderedAccessViews(0, 1, &myBayerBmpPixUAV, &UAVinitCounts);

//run CS; the HLSL will spawn 1 thread per pix (trivially parrallel)
//1 group per pix, one thread per group.
myContext->Dispatch(1280, 800, 1);

//**TODO: wait here for completion?
D3D11_QUERY_DESC pQryDsc;
pQryDsc.Query = D3D11_QUERY_EVENT;
pQryDsc.MiscFlags = 0;
ID3D11Query* pEventQry;
myDevice->CreateQuery(&pQryDsc, &pEventQry);
//insert a fence into the pshbuffer
myContext->End(pEventQry);
//spin until evnt returns
while(myContext->GetData(pEventQry, NULL, 0, 0) ==S_FALSE )
{
System::Threading::Thread::Sleep(0);
}

#ifdef _DEBUG
//copy data back from gpu mem to main mem
ID3D11Buffer* tmpBuff = NULL;
D3D11_BUFFER_DESC desc; ZeroMemory(&desc, sizeof(desc));
RgbPixelF* bmpBuff;
D3D11_MAPPED_SUBRESOURCE mapRsrc;

//make a copy of the dx buffer we want to download back to system mem (on gpu)
myBayerOutBuffer->GetDesc(&desc);
//modify the buffer desc for read/download
desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
desc.Usage = D3D11_USAGE_STAGING;//this is a temporary buff
desc.BindFlags = 0;
desc.MiscFlags = 0;
myDevice->CreateBuffer(&desc, NULL, &tmpBuff);

myContext->CopyResource(tmpBuff, myBayerOutBuffer);//copy the data
myContext->Map(tmpBuff, 0, D3D11_MAP_READ, 0, &mapRsrc);//get an address to the data
bmpBuff = (RgbPixelF*)mapRsrc.pData;//get a ptr to the address

myContext->Unmap(myBayerOutBuffer, 0);
SAFE_RELEASE(tmpBuff);
#endif

SAFE_RELEASE(pEventQry);

this->UnbindCSresources();

//release old input buffers
SAFE_RELEASE(myBayerInBufferSRV);
SAFE_RELEASE(myBayerInBuffer);

//pass the resource view for the produced BMP data (CS output) to the base class PS input SRV...
DxInterface::myPixelInputSRV = myBayerBmpPixSRV;

//draw the frame using the new frame data.
Draw();
}


Edit: fixed pixel width to be 800/802 not 840/842. This is not the main problem, as this would have only mangled the image not zeroed/blacked it. Edited by seljo

Share this post


Link to post
Share on other sites
Unfortunately it didn't compile out of the box for me (I don't have the PDC project on my machine), but I can give you a few pointers or ideas to try.

First is to ensure that your input data is being properly read in. Try to just pass the data through to the output structure and see if you get something expected on the other end. This will prove that your input and output addressing is working properly, plus that your input and output resources are properly being created and bound in the proper order.

The point that jumps out at me, is that your output address calculation (called 'desti') is using the GI value, which is SV_GroupIndex. That system value gives you a flat index for the current thread group - which you are using 1x1x1 thread groups. This means that all your thread results are writing to the same output, and is probably the first element of your buffer. That would fit your description, so try checking that out. Better in your case would be to use the SV_DispatchID and to manually create an input/output index. (Ideally you would use texture based resources and take advantage of the special functionality and addressing advantages in using 2D resources, but it should still work for your starter case). I think this will get you started on the right path...

The other point is that while PIX doesn't directly support debugging the new shader models, it should still allow you to take a frame capture of a frame that uses them. Then you can inspect the contents of the resources before and after a pipeline execution (either rendering or computation based). This will also help to ensure that your resources are properly being linked to the pipeline.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!