Need help with HLSL/Direct Compute

Started by
14 comments, last by seljo.myeri 12 years, 10 months ago
Unfortunately it didn't compile out of the box for me [/quote]

You shouldn't need the PDC project, as I copied/modded the code into my DxViewerNET project. I am using the Dx11 SDK, so that may be an issue. The PDC modified example code is the "VoronoiLabDx.h/cpp". To run that instead of my dx code (BayerIdx.h/cpp), change the myIdx var from a BayerIDx type to the VoronoiLabIDx type in DxViewerCtl.h (line 76 is commented out). Un-comment other Voronoi-related methods (lines ~99-115). Go to Form1 in DxViewerTester (C#) and un-comment lines related to VoronoiXXX methods (lines ~135, 82, 53). If you give me the build errors, I can probably figure out what you issues are. I would guess it is related to different file paths.

First is to ensure that your input data is being properly read in. Try to just pass the data through to the output structure and see if you get something expected on the other end. [/quote]

duh. Thanx, I'll try that. Not sure why i didn't think of that.


your output address calculation (called 'desti') is using the GI value, which is SV_GroupIndex. That system value gives you a flat index for the current thread group - which you are using 1x1x1 thread groups.[/quote]

I thought I was using 1280x800 Thread groups in my dispatch call. Maybe the code on skydrive is older...the quoted code above shows myContext->Dispatch(1280, 800, 1);
Using this dispatch call I *thought* would make 1280*800 thread groups with one thread each (the [numthreads(1, 1, 1)] line in my CS ), and then using SV_GroupIndex would correlate to the 1-D array index of the buffer... Am I confused?

Thanks for the feedback!
Advertisement
your output address calculation (called 'desti') is using the GI value, which is SV_GroupIndex. That system value gives you a flat index for the current thread group - which you are using 1x1x1 thread groups.[/quote]

I thought I was using 1280x800 Thread groups in my dispatch call. Maybe the code on skydrive is older...the quoted code above shows myContext->Dispatch(1280, 800, 1);
Using this dispatch call I *thought* would make 1280*800 thread groups with one thread each (the [numthreads(1, 1, 1)] line in my CS ), and then using SV_GroupIndex would correlate to the 1-D array index of the buffer... Am I confused?[/quote]


Yes, I am/was confused. Changing my CS code to the following gave me something resembling the image i expected...:D though it is mono-chrome (green). There is still something off somewhere, but it looks like an indexing issue either in the CS or PS. Once I figure that out, I will work on correcting the color, then .... performance! The CS was only outputting to index 0 in the output buffer. Is there a good explaination of the threads/groups somewhere? The explanation in the PDC lab download I linked confused me I guess.

//This is the Bayer compute shader file.

//must match the calling code...
#define rawPixW 1282
#define rawPixH 802
#define outPixW 1280
#define outPixH 800

// definition of a Bayer color pixel buffer element
struct BayerRGBPixelBufType
{
float r;
float g;
float b;
};

struct BayerPixelBufTypeF
{
float PixVal;
};

//changes per frame
cbuffer CB0
{
float frameMax;
float frameMin;
};

//Output RGB frame data...
RWStructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer;

// a read-only view of the raw bayer format pixel input buffer
StructuredBuffer<BayerPixelBufTypeF> sourcePixels;



[numthreads(1, 1, 1)]// execute one thread per pixel in group; groups = pixels per frame...
void CS( uint3 Gid : SV_GroupID,
uint3 DTid : SV_DispatchThreadID,
uint3 GTid : SV_GroupThreadID,
uint GI : SV_GroupIndex )
{
//get the current pixel index in the SOURCE buffer ( add 1 since the co-ords are for output; less the outer ring of pixels)
uint ipix = ((Gid.x + 1) * rawPixW)+ Gid.y+1;
//pixel index in output buffer
uint desti = Gid.x* outPixW + Gid.y;
bool evenRow = (uint((Gid.y+1) % 2) == 0);
bool evenCol = (uint((Gid.x+1) % 2) == 0);


BayerRGBFrameBuffer[desti].r = (sourcePixels[ipix].PixVal
BayerRGBFrameBuffer[desti].g = (sourcePixels[ipix].PixVal
BayerRGBFrameBuffer[desti].b = (sourcePixels[ipix].PixVal
}
Ok, so I got it almost working. The only problem I have now is that my rendered image is split in half vertically and the left and right sides are transposed. Any ideas? My only guess has something to do with the Gid v. DTid v. GTid and hwo I calculate my buffer indexes. I will update the code on my skydrive tonight. Here is my current CS code:


//This is the Bayer compute shader file.

//must match the calling code...
#define rawPixW 1282
#define rawPixH 802
#define outPixW 1280
#define outPixH 800

// definition of a Bayer color pixel buffer element
struct BayerRGBPixelBufType
{
float r;
float g;
float b;
};

struct BayerPixelBufTypeF
{
float PixVal;
};

//changes per frame
cbuffer CB0
{
float frameMax;
float frameMin;
};

//Output RGB frame data...
RWStructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer;

// a read-only view of the raw bayer format pixel input buffer
StructuredBuffer<BayerPixelBufTypeF> sourcePixels;


//[numthreads(outPixW, outPixH, 1)]//this is 1 group per frame: threads limited to 768 per group...
[numthreads(1, 1, 1)]// execute one thread per pixel in group; groups = pixels per frame...?
void CS( uint3 Gid : SV_GroupID,
uint3 DTid : SV_DispatchThreadID,
uint3 GTid : SV_GroupThreadID,
uint GI : SV_GroupIndex )
{
//get the current pixel index in the SOURCE buffer ( add 1 since the co-ords are for output; less the outer ring of pixels)
uint ipix = Gid.x + 1 + ((Gid.y+1)* rawPixW);
//pixel index in output buffer
uint desti = Gid.x + Gid.y * outPixW;
bool evenRow = (uint((Gid.y+1) % 2) == 0);
bool evenCol = (uint((Gid.x+1) % 2) == 0);

//leave set const alpha for all pixels from init
//pixOut.a = 1.0f;

//**TODO: normalize...? assume normalized already by CPU for now...
uint left = ipix - 1;
uint right = ipix + 1;
uint above = ipix - rawPixW;
uint below = ipix + rawPixW;
uint topLeft = 0;
uint bottomLeft = 0;
uint topRight = 0;
uint bottomRight = 0;

//check what row we're on (even: GR)
if(evenRow)
{
//check which col we're on
if(evenCol)
{
//even col: green pixel

// GREEN IN CENTER
//
// X B X
// R G R
// X B X
//
BayerRGBFrameBuffer[desti].r = (sourcePixels

.PixVal
+ sourcePixels

.PixVal) * 0.5f;
////
BayerRGBFrameBuffer[desti].g = sourcePixels[ipix].PixVal;
////
BayerRGBFrameBuffer[desti].b = (sourcePixels[above].PixVal
+ sourcePixels[below].PixVal) * 0.5f;

}
else
{
//odd: red pixel
topLeft = above - 1;
bottomLeft = below - 1;
topRight = above + 1;
bottomRight = below + 1;

// RED IN CENTER
//
// B G B G
// G R G R
// B G B G
//
BayerRGBFrameBuffer[desti].r = sourcePixels[ipix].PixVal;
//
BayerRGBFrameBuffer[desti].g = (sourcePixels

.PixVal
+ sourcePixels

.PixVal
+ sourcePixels[above].PixVal
+ sourcePixels[below].PixVal) * 0.25f;
////
BayerRGBFrameBuffer[desti].b = (sourcePixels[topLeft].PixVal
+ sourcePixels[bottomLeft].PixVal
+ sourcePixels[topRight].PixVal
+ sourcePixels[bottomRight].PixVal) * 0.25f;
}
}
else //(odd row): GB
{
//check which col we're on
if(!evenCol)
{
//even: G
// GREEN IN CENTER
//
// X R X
// B G B
// X R X
//
BayerRGBFrameBuffer[desti].r = (sourcePixels[above].PixVal
+ sourcePixels[below].PixVal) * 0.5f;
////
BayerRGBFrameBuffer[desti].g = sourcePixels[ipix].PixVal;
////
BayerRGBFrameBuffer[desti].b = (sourcePixels

.PixVal
+ sourcePixels

.PixVal) * 0.5f;
}
else
{
//odd: B
topLeft = above - 1;
bottomLeft = below - 1;
topRight = above + 1;
bottomRight = below + 1;
// BLUE IN CENTER
//
// R G R
// G B G
// R G R
//
BayerRGBFrameBuffer[desti].r = (sourcePixels[topLeft].PixVal
+ sourcePixels[bottomLeft].PixVal
+ sourcePixels[topRight].PixVal
+ sourcePixels[bottomRight].PixVal) * 0.25f;

BayerRGBFrameBuffer[desti].g = (sourcePixels

.PixVal
+ sourcePixels

.PixVal
+ sourcePixels[above].PixVal
+ sourcePixels[below].PixVal) * 0.25f;
//
BayerRGBFrameBuffer[desti].b = sourcePixels[ipix].PixVal;
}
}
}

OK, so I updated my code on sky drive and the original link above. I managed to fix the left-right inversion, but I don't understand why I have to add/subtract 640 (0.5 of an entire row) from the x coordinate in the PS. Can anyone explain? I also changed my Dispatch and numthreads to improve performance to ~60 fps. Probably some more can be done there, but pretty good for 5 min of "tuning".
Ok, so I have the basic image conversion done in the CS, and now I want to be able to re-size my host window. To do so, I think I need to transfer the output buffer data into a texture buffer so that the PS can sample it to the correct size on the desired surface. How do I do that without copying the data back the system mem? That would an expensive proposition. I have seen some examples of setting texture from memory, but the one linked would require copying from Gmem->SysMem->Gmem. That would be too expensive. Is there a way I can create a texture resource/view from the RWStructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; my CS outputs? Maybe I need to modify that buffer to include the coordinate data a texture requires so that it is a texture, essentially? If so, where can I find an example texture structure?
I want to thank those who have replied so far. If anyone else has done something similar, I'd appreciate any pointers you may have.

This topic is closed to new replies.

Advertisement