Jump to content
  • Advertisement

seljo.myeri

Member
  • Content Count

    11
  • Joined

  • Last visited

Community Reputation

100 Neutral

About seljo.myeri

  • Rank
    Member
  1. seljo.myeri

    Need help with HLSL/Direct Compute

    I want to thank those who have replied so far. If anyone else has done something similar, I'd appreciate any pointers you may have.
  2. seljo.myeri

    Need help with HLSL/Direct Compute

    Ok, so I have the basic image conversion done in the CS, and now I want to be able to re-size my host window. To do so, I think I need to transfer the output buffer data into a texture buffer so that the PS can sample it to the correct size on the desired surface. How do I do that without copying the data back the system mem? That would an expensive proposition. I have seen some examples of setting texture from memory, but the one linked would require copying from Gmem->SysMem->Gmem. That would be too expensive. Is there a way I can create a texture resource/view from the RWStructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; my CS outputs? Maybe I need to modify that buffer to include the coordinate data a texture requires so that it is a texture, essentially? If so, where can I find an example texture structure?
  3. seljo.myeri

    Need help with HLSL/Direct Compute

    OK, so I updated my code on sky drive and the original link above. I managed to fix the left-right inversion, but I don't understand why I have to add/subtract 640 (0.5 of an entire row) from the x coordinate in the PS. Can anyone explain? I also changed my Dispatch and numthreads to improve performance to ~60 fps. Probably some more can be done there, but pretty good for 5 min of "tuning".
  4. seljo.myeri

    Need help with HLSL/Direct Compute

    Ok, so I got it almost working. The only problem I have now is that my rendered image is split in half vertically and the left and right sides are transposed. Any ideas? My only guess has something to do with the Gid v. DTid v. GTid and hwo I calculate my buffer indexes. I will update the code on my skydrive tonight. Here is my current CS code: //This is the Bayer compute shader file. //must match the calling code... #define rawPixW 1282 #define rawPixH 802 #define outPixW 1280 #define outPixH 800 // definition of a Bayer color pixel buffer element struct BayerRGBPixelBufType { float r; float g; float b; }; struct BayerPixelBufTypeF { float PixVal; }; //changes per frame cbuffer CB0 { float frameMax; float frameMin; }; //Output RGB frame data... RWStructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; // a read-only view of the raw bayer format pixel input buffer StructuredBuffer<BayerPixelBufTypeF> sourcePixels; //[numthreads(outPixW, outPixH, 1)]//this is 1 group per frame: threads limited to 768 per group... [numthreads(1, 1, 1)]// execute one thread per pixel in group; groups = pixels per frame...? void CS( uint3 Gid : SV_GroupID, uint3 DTid : SV_DispatchThreadID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex ) { //get the current pixel index in the SOURCE buffer ( add 1 since the co-ords are for output; less the outer ring of pixels) uint ipix = Gid.x + 1 + ((Gid.y+1)* rawPixW); //pixel index in output buffer uint desti = Gid.x + Gid.y * outPixW; bool evenRow = (uint((Gid.y+1) % 2) == 0); bool evenCol = (uint((Gid.x+1) % 2) == 0); //leave set const alpha for all pixels from init //pixOut.a = 1.0f; //**TODO: normalize...? assume normalized already by CPU for now... uint left = ipix - 1; uint right = ipix + 1; uint above = ipix - rawPixW; uint below = ipix + rawPixW; uint topLeft = 0; uint bottomLeft = 0; uint topRight = 0; uint bottomRight = 0; //check what row we're on (even: GR) if(evenRow) { //check which col we're on if(evenCol) { //even col: green pixel // GREEN IN CENTER // // X B X // R G R // X B X // BayerRGBFrameBuffer[desti].r = (sourcePixels.PixVal + sourcePixels.PixVal) * 0.5f; //// BayerRGBFrameBuffer[desti].g = sourcePixels[ipix].PixVal; //// BayerRGBFrameBuffer[desti].b = (sourcePixels[above].PixVal + sourcePixels[below].PixVal) * 0.5f; } else { //odd: red pixel topLeft = above - 1; bottomLeft = below - 1; topRight = above + 1; bottomRight = below + 1; // RED IN CENTER // // B G B G // G R G R // B G B G // BayerRGBFrameBuffer[desti].r = sourcePixels[ipix].PixVal; // BayerRGBFrameBuffer[desti].g = (sourcePixels.PixVal + sourcePixels.PixVal + sourcePixels[above].PixVal + sourcePixels[below].PixVal) * 0.25f; //// BayerRGBFrameBuffer[desti].b = (sourcePixels[topLeft].PixVal + sourcePixels[bottomLeft].PixVal + sourcePixels[topRight].PixVal + sourcePixels[bottomRight].PixVal) * 0.25f; } } else //(odd row): GB { //check which col we're on if(!evenCol) { //even: G // GREEN IN CENTER // // X R X // B G B // X R X // BayerRGBFrameBuffer[desti].r = (sourcePixels[above].PixVal + sourcePixels[below].PixVal) * 0.5f; //// BayerRGBFrameBuffer[desti].g = sourcePixels[ipix].PixVal; //// BayerRGBFrameBuffer[desti].b = (sourcePixels.PixVal + sourcePixels.PixVal) * 0.5f; } else { //odd: B topLeft = above - 1; bottomLeft = below - 1; topRight = above + 1; bottomRight = below + 1; // BLUE IN CENTER // // R G R // G B G // R G R // BayerRGBFrameBuffer[desti].r = (sourcePixels[topLeft].PixVal + sourcePixels[bottomLeft].PixVal + sourcePixels[topRight].PixVal + sourcePixels[bottomRight].PixVal) * 0.25f; BayerRGBFrameBuffer[desti].g = (sourcePixels.PixVal + sourcePixels.PixVal + sourcePixels[above].PixVal + sourcePixels[below].PixVal) * 0.25f; // BayerRGBFrameBuffer[desti].b = sourcePixels[ipix].PixVal; } } }
  5. seljo.myeri

    Need help with HLSL/Direct Compute

    your output address calculation (called 'desti') is using the GI value, which is SV_GroupIndex. That system value gives you a flat index for the current thread group - which you are using 1x1x1 thread groups.[/quote] I thought I was using 1280x800 Thread groups in my dispatch call. Maybe the code on skydrive is older...the quoted code above shows myContext->Dispatch(1280, 800, 1); Using this dispatch call I *thought* would make 1280*800 thread groups with one thread each (the [numthreads(1, 1, 1)] line in my CS ), and then using SV_GroupIndex would correlate to the 1-D array index of the buffer... Am I confused?[/quote] Yes, I am/was confused. Changing my CS code to the following gave me something resembling the image i expected... though it is mono-chrome (green). There is still something off somewhere, but it looks like an indexing issue either in the CS or PS. Once I figure that out, I will work on correcting the color, then .... performance! The CS was only outputting to index 0 in the output buffer. Is there a good explaination of the threads/groups somewhere? The explanation in the PDC lab download I linked confused me I guess. //This is the Bayer compute shader file. //must match the calling code... #define rawPixW 1282 #define rawPixH 802 #define outPixW 1280 #define outPixH 800 // definition of a Bayer color pixel buffer element struct BayerRGBPixelBufType { float r; float g; float b; }; struct BayerPixelBufTypeF { float PixVal; }; //changes per frame cbuffer CB0 { float frameMax; float frameMin; }; //Output RGB frame data... RWStructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; // a read-only view of the raw bayer format pixel input buffer StructuredBuffer<BayerPixelBufTypeF> sourcePixels; [numthreads(1, 1, 1)]// execute one thread per pixel in group; groups = pixels per frame... void CS( uint3 Gid : SV_GroupID, uint3 DTid : SV_DispatchThreadID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex ) { //get the current pixel index in the SOURCE buffer ( add 1 since the co-ords are for output; less the outer ring of pixels) uint ipix = ((Gid.x + 1) * rawPixW)+ Gid.y+1; //pixel index in output buffer uint desti = Gid.x* outPixW + Gid.y; bool evenRow = (uint((Gid.y+1) % 2) == 0); bool evenCol = (uint((Gid.x+1) % 2) == 0); BayerRGBFrameBuffer[desti].r = (sourcePixels[ipix].PixVal BayerRGBFrameBuffer[desti].g = (sourcePixels[ipix].PixVal BayerRGBFrameBuffer[desti].b = (sourcePixels[ipix].PixVal }
  6. seljo.myeri

    Need help with HLSL/Direct Compute

    Unfortunately it didn't compile out of the box for me [/quote] You shouldn't need the PDC project, as I copied/modded the code into my DxViewerNET project. I am using the Dx11 SDK, so that may be an issue. The PDC modified example code is the "VoronoiLabDx.h/cpp". To run that instead of my dx code (BayerIdx.h/cpp), change the myIdx var from a BayerIDx type to the VoronoiLabIDx type in DxViewerCtl.h (line 76 is commented out). Un-comment other Voronoi-related methods (lines ~99-115). Go to Form1 in DxViewerTester (C#) and un-comment lines related to VoronoiXXX methods (lines ~135, 82, 53). If you give me the build errors, I can probably figure out what you issues are. I would guess it is related to different file paths. First is to ensure that your input data is being properly read in. Try to just pass the data through to the output structure and see if you get something expected on the other end. [/quote] duh. Thanx, I'll try that. Not sure why i didn't think of that. your output address calculation (called 'desti') is using the GI value, which is SV_GroupIndex. That system value gives you a flat index for the current thread group - which you are using 1x1x1 thread groups.[/quote] I thought I was using 1280x800 Thread groups in my dispatch call. Maybe the code on skydrive is older...the quoted code above shows myContext->Dispatch(1280, 800, 1); Using this dispatch call I *thought* would make 1280*800 thread groups with one thread each (the [numthreads(1, 1, 1)] line in my CS ), and then using SV_GroupIndex would correlate to the 1-D array index of the buffer... Am I confused? Thanks for the feedback!
  7. seljo.myeri

    Need help with HLSL/Direct Compute

    I'm pretty sure this is an inefficient way to dispatch the threads you need. Try launching 32 by 16 by 1 threads per group and dispatching 1280/32 by 800/16 by 1 thread groups.[/quote] I figured it wasn't optimal, but just trying to get *something* to render right now. I'll keep this in mind when I get to the optimization part (hopefully). As to why you are getting a black screen, well, looking at your PS code it seems to me that your BayerRGBFrameBuffer isn't filled with any input. I think you need to replace StructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; with StructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer : register (t0); and fill that t0 slot with the desired information CPU side. [/quote] I thought the registers were set according to order of declaration by the compiler if not explicitly set, as you suggest. I copied the outline of the shader code from that PDC lab code I referenced above, and that works, but I'll check that out. It definitely seems that my CS output buffer may not be passed to the PS input buffer. I am downloading the CS output to the CPU to take a look and it is all zero. Something is wrong there before the PS is even a factor. The entire solution is available for download, I can help if there are issues getting it to build. Here is the CS code: //This is the Bayer compute shader file. //must match the calling code... #define rawPixW 1282 #define rawPixH 802 #define outPixW 1280 #define outPixH 800 // definition of a Bayer color pixel buffer element struct BayerRGBPixelBufType { float r; float g; float b; }; struct BayerPixelBufTypeF { float PixVal; }; //changes per frame cbuffer CB0 { float frameMax; float frameMin; }; //Output RGB frame data... RWStructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; // a read-only view of the raw bayer format pixel input buffer StructuredBuffer<BayerPixelBufTypeF> sourcePixels; //[numthreads(outPixW, outPixH, 1)]//this is 1 group per frame: threads limited to 768 per group... [numthreads(1, 1, 1)]// execute one thread per pixel in group; groups = pixels per frame... void CS( uint3 Gid : SV_GroupID, uint3 DTid : SV_DispatchThreadID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex ) { //vector <float, 4> pixOut = vector <float, 4>(1.f,1.f,1.f,1.f); ////get the current pixel index in the SOURCE buffer //uint ipix = DTid.x + 1 + ((DTid.y+1) * rawPixW); ////pixel index in output buffer //uint desti = DTid.x + DTid.y; //bool evenRow = (uint((DTid.y+1) % 2) == 0); //bool evenCol = (uint((DTid.x+1) % 2) == 0); //get the current pixel index in the SOURCE buffer ( add 1 since the co-ords are for output; less the outer ring of pixels) uint ipix = Gid.x + 1 + ((Gid.y+1) * rawPixW); //pixel index in output buffer uint desti = GI;//Gid.x + Gid.y; bool evenRow = (uint((Gid.y+1) % 2) == 0); bool evenCol = (uint((Gid.x+1) % 2) == 0); //leave set const alpha for all pixels from init //pixOut.a = 1.0f; //**TODO: normalize...? assume normalized already by CPU for now... uint left = ipix - 1; uint right = ipix + 1; uint above = ipix - rawPixW; uint below = ipix + rawPixW; uint topLeft = 0; uint bottomLeft = 0; uint topRight = 0; uint bottomRight = 0; //check what row we're on (even: GR) if(evenRow) { //check which col we're on if(evenCol) { //even col: green pixel // GREEN IN CENTER // // X B X // R G R // X B X // BayerRGBFrameBuffer[desti].r = (sourcePixels.PixVal + sourcePixels.PixVal) * 0.5f; //// BayerRGBFrameBuffer[desti].g = sourcePixels[ipix].PixVal; //// BayerRGBFrameBuffer[desti].b = (sourcePixels[above].PixVal + sourcePixels[below].PixVal) * 0.5f; } else { //odd: red pixel topLeft = above - 1; bottomLeft = below - 1; topRight = above + 1; bottomRight = below + 1; // RED IN CENTER // // B G B G // G R G R // B G B G // BayerRGBFrameBuffer[desti].r = sourcePixels[ipix].PixVal; // BayerRGBFrameBuffer[desti].g = (sourcePixels.PixVal + sourcePixels.PixVal + sourcePixels[above].PixVal + sourcePixels[below].PixVal) * 0.25f; //// BayerRGBFrameBuffer[desti].b = (sourcePixels[topLeft].PixVal + sourcePixels[bottomLeft].PixVal + sourcePixels[topRight].PixVal + sourcePixels[bottomRight].PixVal) * 0.25f; } } else //(odd row): GB { //check which col we're on if(evenCol) { //even: G // GREEN IN CENTER // // X R X // B G B // X R X // BayerRGBFrameBuffer[desti].r = (sourcePixels[above].PixVal + sourcePixels[below].PixVal) * 0.5f; //// BayerRGBFrameBuffer[desti].g = sourcePixels[ipix].PixVal; //// BayerRGBFrameBuffer[desti].b = (sourcePixels.PixVal + sourcePixels.PixVal) * 0.5f; } else { //odd: B topLeft = above - 1; bottomLeft = below - 1; topRight = above + 1; bottomRight = below + 1; // BLUE IN CENTER // // R G R // G B G // R G R // BayerRGBFrameBuffer[desti].r = (sourcePixels[topLeft].PixVal + sourcePixels[bottomLeft].PixVal + sourcePixels[topRight].PixVal + sourcePixels[bottomRight].PixVal) * 0.25f; BayerRGBFrameBuffer[desti].g = (sourcePixels.PixVal + sourcePixels.PixVal + sourcePixels[above].PixVal + sourcePixels[below].PixVal) * 0.25f; // BayerRGBFrameBuffer[desti].b = sourcePixels[ipix].PixVal; } } } Here is the C++ code between CS and PS calls: void DxViewerNET::Native::BayerIDx::DrawBayerFrame( BayerPixelF* frameBuf, FrameRange range, UINT bufLen /*= 1282*842*/ ) { HRESULT hr = S_OK; //create the raw bayer image buffer (input) CreateBufferResourceAndViews(sizeof( BayerPixelF ), bufLen, frameBuf, &myBayerInBuffer, &myBayerInBufferSRV, NULL ) ; //**TODO:update the range for normalization...--assume normalized already for now //**TODO: call the Comp Shader //set up myContext->CSSetShader(DxInterface::myComputeShader, NULL, 0); // give the CS read access to the input buffer myContext->CSSetShaderResources( 0, 1, &myBayerInBufferSRV ); //give the CS write access to the BMP output buff UINT UAVinitCounts = 0; myContext->CSSetUnorderedAccessViews(0, 1, &myBayerBmpPixUAV, &UAVinitCounts); //run CS; the HLSL will spawn 1 thread per pix (trivially parrallel) //1 group per pix, one thread per group. myContext->Dispatch(1280, 800, 1); //**TODO: wait here for completion? D3D11_QUERY_DESC pQryDsc; pQryDsc.Query = D3D11_QUERY_EVENT; pQryDsc.MiscFlags = 0; ID3D11Query* pEventQry; myDevice->CreateQuery(&pQryDsc, &pEventQry); //insert a fence into the pshbuffer myContext->End(pEventQry); //spin until evnt returns while(myContext->GetData(pEventQry, NULL, 0, 0) ==S_FALSE ) { System::Threading::Thread::Sleep(0); } #ifdef _DEBUG //copy data back from gpu mem to main mem ID3D11Buffer* tmpBuff = NULL; D3D11_BUFFER_DESC desc; ZeroMemory(&desc, sizeof(desc)); RgbPixelF* bmpBuff; D3D11_MAPPED_SUBRESOURCE mapRsrc; //make a copy of the dx buffer we want to download back to system mem (on gpu) myBayerOutBuffer->GetDesc(&desc); //modify the buffer desc for read/download desc.CPUAccessFlags = D3D11_CPU_ACCESS_READ; desc.Usage = D3D11_USAGE_STAGING;//this is a temporary buff desc.BindFlags = 0; desc.MiscFlags = 0; myDevice->CreateBuffer(&desc, NULL, &tmpBuff); myContext->CopyResource(tmpBuff, myBayerOutBuffer);//copy the data myContext->Map(tmpBuff, 0, D3D11_MAP_READ, 0, &mapRsrc);//get an address to the data bmpBuff = (RgbPixelF*)mapRsrc.pData;//get a ptr to the address myContext->Unmap(myBayerOutBuffer, 0); SAFE_RELEASE(tmpBuff); #endif SAFE_RELEASE(pEventQry); this->UnbindCSresources(); //release old input buffers SAFE_RELEASE(myBayerInBufferSRV); SAFE_RELEASE(myBayerInBuffer); //pass the resource view for the produced BMP data (CS output) to the base class PS input SRV... DxInterface::myPixelInputSRV = myBayerBmpPixSRV; //draw the frame using the new frame data. Draw(); } Edit: fixed pixel width to be 800/802 not 840/842. This is not the main problem, as this would have only mangled the image not zeroed/blacked it.
  8. seljo.myeri

    Need help with HLSL/Direct Compute

    So, any other ideas as to why the CS is outputting the same result, no matter the input? I am dispatching 1280x800 Thread groups with one thread per group because I ran into the 768 concurrent thread limit trying to dispatch a single group with a thread per pixel. Is there a limit on the total groups as well?
  9. seljo.myeri

    Need help with HLSL/Direct Compute

    I would say just use the UINT format for now, and that way you can just do the conversions manually like you would in your CPU code.[/quote] Ok, but how? Just put my 16 bit values into 32 bit buckets in the buffer and let the compiler pad appropriately? Currently, I am setting DXGI_FORMAT_UNKNOWN for all UAV's and SV's. Also, it should be noted that I am taking the resulting output buffer from the CS and using it directly in the PS. I do not create a "texture" per se. That was to be later (so I could put thetexture on a rectangle and move the rectangle based on the window size). Right now the CS does all the computing and the render window is a fixed size. Here is my PS: // definition of a Bayer pixel color buffer element struct BayerRGBPixelBufType { float r; float g; float b; }; struct BayerPixelBufTypeF { float val; }; // a read-only view of the point color buffer resource StructuredBuffer<BayerRGBPixelBufType> BayerRGBFrameBuffer; float4 PS( float4 Pos : SV_POSITION ) : SV_Target { vector <float, 4> pixOut = vector <float, 4>(.0f, 0.f, 1.f, 1.f);//(1.f,1.f,1.f,1.f); //get the current pixel index in the converted bitmap uint ipix = Pos.x + ((Pos.y) * 1280); //leave set const alpha for all pixels from init //pixOut.a = 1.0f; pixOut.r = BayerRGBFrameBuffer[ipix].r; pixOut.g = BayerRGBFrameBuffer[ipix].g; pixOut.b = BayerRGBFrameBuffer[ipix].b; return pixOut; }[/quote]
  10. seljo.myeri

    Need help with HLSL/Direct Compute

    So what you can take away from this is that you should never need to do any manual normalization on the CPU-side when setting the data for textures or buffers, you need to set them in their raw memory format and the GPU will convert them as needed in the shaders.[/quote] Good, I knew the GPU was the place to do this, but I'm a noob, so didn't know it was automatic. This conversion from ushort to float on the CPU pre-input buffer loading is most likely the culprit (in conjunction with the PS format specification), in this case. The wrong format spec would seem to explain why the output is the same regardless of my CPU (or lack of) normalization. if you use a UNORM format, the actual raw texture memory will be an unsigned integer but when you sample it in the shader it will get converted to a 0.0 - 1.0 value.[/quote] 2 points: 1) My source data is in ushort data elements, however it is not normalized. Each frame has to be normalized based on the max and min values. Ex: the raw data for one frame may have one pixel with val 900 (max) and another with 234 (min). This is now the "range" for all pixels in that frame, and must be converted to the "color range format". Ex: for 8 bit color range 900 = 255 and 234 = 0... (See my CPU rendering C# code for 24 bit RGB conversion). 2) How do I represent a ushort (unsigned 16bit int) in a HLSL buffer? I see no integer type smaller than 32 bits (which is why I converted to float on the CPU in the first place...and then assumed the normalization was needed there too... a slippery slope). If I understand the format conversion link you gave, each pixel still has to be normalized to a standard range (255 to 0, for example) even if I pack my 16 or 8 bit uints into a 32bit uint buffer? Thanks for the feedback! It gives me a direction to investigate at least. Please let me know if there is anything you can find in my code or other tips!
  11. I am working on a custom movie format player accelerated via DX10.1/11. I have a VS2010 solution that builds and runs but I am fairly noob to DX coding (only been playing with it for a few days) . Been doing C#/C++ for 10 yrs. I *think* my problem is in my shader code; the CS. The source image/movie data is in a Bayer format, which requires several interpolations per pixel to compute all RGB values because each "pixel" only contains one color sample; the other two are interpolated from nearest neighbor pixels. You can check Wiki or the source for more details there. I have a "software" version that uses only the CPU in C# that works; "just" needs a 4x core 3+GHz system to run full 24-30 fps. The problem I have with the Dx version is that the result rendered is only a black screen. If I hard code the PS to solid colors, I get what I expect, but using the results of the CS leads to the black render area. Downloading the output buffer from the CS after it runs shows that all but the first pixel val is zero, and the 1st is practically zero (~1*e-38). This explains the render results, but I don't know why this is the result of the CS. I assume from all examples I've seen that the PS uses normalized color range of 0 to 1.0 floating point, and I am doing that normalization on the CPU before sending it to the GPU. The output doesn't change regardless of the input's normalization status. The source format is 16 bit unsigned int value per pixel. My software version will suffice for my current needs, but I am working toward learning DX in general and so I can do more advanced things like overlay symbols/highlight specific areas of the images as they play, etc. I started trying to learn Dx/Compute from this PDC lab. My DX code is based primarily on this. Are there any good debugging tools I can use to check my CS code? Anyone want to take a look at my code and offer feedback? I could not attach the files here, so I put them on my [url="http://cid-255987229e4d13e3.office.live.com/self.aspx/.Documents/DxViewerNET.zip"]sky drive[/url]. (updated link to latest code)
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!