Sign in to follow this  
Followers 0
Migi0027

DX11
DX11 - HLSL - Billboarding

6 posts in this topic

Hi guys! happy.png

 

This is my current issue, I'm trying to create 3d billboards, now this is how I think I should do it:

 

(Sample: Render thousands of grass instances)

 

Create Instance Buffer ( -> Position, Rotation)

Send Mesh Stuff + Instance Buffer

 

Shader:

{
  float4x4 rMatrix = generateRotationMatrix(instanceRot);
  position = mul(position, rMatrix);
  position += instancepos;

  ...apply World, View, Proj matrices here.
  ...
}

But is this the best way? Because I don't think passing a constant buffer with thousands of rotation matrices would be so good, am I wrong?

 

What does you're experience tell you to do here?

 

So the real question is: How can I individually rotate thousands of instances to the cameras view?

 

Thank You, like always! wink.png

0

Share this post


Link to post
Share on other sites

You don't rotate billboards. Billboards always face the camera because they are just 2-D sprites. The only matrix you need to store per instance in your instance buffer is the world matrix. This matrix will have the position and scale of the billboards, such that the further it is the smaller it gets.

 

Anyway for instancing models this is what I do:

#define NUM_MAX_INSTANCES 128

struct InstanceBuffer
{
	XMFLOAT4X4 InstanceWorld[NUM_MAX_INSTANCES];
};

Then my shaders (note this is a normal model, not a billboard) look something like this:

PS_INPUT VS( VS_INPUT input, uint iid : SV_InstanceID)
{
    PS_INPUT output = (PS_INPUT)0;
    output.Pos = mul( input.Pos, InstanceWorld[iid] );
    output.Pos = mul( output.Pos, View );
    output.Pos = mul( output.Pos, Projection );
    output.Norm = mul(float4(input.Norm, 0), InstanceWorld[iid] ).xyz;
	output.Tex = input.Tex;
    
    return output;
}
#define NUM_MAX_INSTANCES 128

cbuffer InstanceBuffer : register(b1)
{
	matrix InstanceWorld[NUM_MAX_INSTANCES];
}

Someone else can chime in with the max size of the instance buffer. I just guessed 128 for now.

 

Any time I need to render a batch of instanced models I call UpdateSubresource() with my new instance buffer and then DrawInstanced(). For drawing thousands of models I would probably have multiple DrawInstanced() calls, but I don't know if there is a better way.

1

Share this post


Link to post
Share on other sites

You don't even need to store a matrix in your per-instance buffer (which I'd use a dynamic vertex buffer for, with D3D11_INPUT_PER_INSTANCE_DATA in the corresponding input layout so that you won't run into cbuffer space limits); you can just store a combined MVP in your per-frame cbuffer, then the position of each instance in your per-instance buffer.

 

In your per-frame cbuffer you also store a couple of vectors which you can extract from your view matrix, and that gives you everything you need to do billboarding, at a significantly reduced data cost and much higher ceiling on how many instances you can have per draw call.

 

A discussion of the technique (implemented in software, but which you should be easily able to convert to shader code) is available here: http://www.mvps.org/directx/articles/view_oriented_billboards.htm

0

Share this post


Link to post
Share on other sites

Thank you mhagain and menohack!

 

mhagain, I followed your link and it is actually working now, and now I'm translating it into shader code, but here is my worry:

 

How can I translate the vertices individually, as each vertex has it's own location.

 

I guess I could use the SV_VertexID and then use some if's, but would that be slow?

 

I'm not asking for you to write the shader code, just guide me.

0

Share this post


Link to post
Share on other sites

The easiest way would be to use a geometry shader.  Input one point, output a 4-vert tristrip, and just lift it from the software version.  You don't need to use instancing with this method, although having the GS stage active will introduce some (small) additional overhead.  If that's acceptable, then go do it.

 

Another method would be to use an additional per-vertex buffer, containing 4 verts.  This buffer can be static, and each vertex is 2 floats: {{-1, -1}, {1, -1}, {-1, 1}, {1, 1}} works well for one tristrip ordering (other tristrip orderings will be different, of course).  Let's assume that this is called "offset" in your VS input struct, that "position" is a float3, and you get the following:

vs_out.position = mul (float4 (vs_in.position + right * offset.x + up * offset.y, 1.0f), globalMVP);

If you want each billboard to have a variable scale ("scale" in your input struct) modify like so:

vs_out.position = mul (float4 (vs_in.position + (right * offset.x + up * offset.y) * vs_in.scale, 1.0f), globalMVP)

This isn't exactly the same as the example in the link I posted; that one produces a diamond shape, this one is a square.

 

These "offsets" can alternatively be indexed via SV_VertexID and also be reused for texcoords, by the way.

Edited by mhagain
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By maxest
      I have code like this:
      groupshared uint tempData[ElementsCount]; [numthreads(ElementsCount/2, 1, 1)] void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID) {     tempData[gtID.x] = 0; } And it works fine. Now I change it to this:
      void MyFunc(inout uint3 gtID: SV_GroupThreadID, inout uint inputData[ElementsCount]) {     inputData[gtID.x] = 0; } groupshared uint tempData[ElementsCount]; [numthreads(ElementsCount/2, 1, 1)] void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID) {     MyFunc(gtID, tempData); } and I get "error X3695: race condition writing to shared memory detected, consider making this write conditional.". Any way to go around this?
    • By maxest
      I implemented DX queries after this blog post:
      https://mynameismjp.wordpress.com/2011/10/13/profiling-in-dx11-with-queries/
      Queries work perfectly fine... for as long as I don't use VSync or any other form of Sleep. Why would that happe? I record queries right before my Compute/Dispatch code, record right after and then read the results (spinning on GetData if returns S_FALSE).
      When I don't VSync then my code takes consistent 0.39-0.4 ms. After turning VSync on it starts with something like 0.46 ms, after a second bumps up to 0.61 ms and a few seconds after I get something like 1.2 ms.
      I also used this source:
      http://reedbeta.com/blog/gpu-profiling-101/
      The difference here is that the author uses the disjoint query for the whole Render()  function instead of using one per particular measurement. When I implemented it this way the timings were incosistent (like above 0.46, 0.61, 1.2) regardless of VSync.
    • By amadeus12
      I wrote computeshader to blur image.
      But it returns black texture.
      actually I'm not using FX file and effect library.
      I use hlsl file and I bind it to pipeline myself with direct api.
      I have 2 hlsl files which do vertical, horizontal blur.
      here my CPU code which executes computeshader.
      void BoxApp::callComputeShaderandBlur(ID3D11DeviceContext * dc, ID3D11ShaderResourceView * inputSRV, ID3D11UnorderedAccessView * inputUAV, int blurcount)
      {
          for (int i = 0; i < blurcount; i++)
          {
              dc->CSSetShader(m_CSH, 0, 0);
              dc->CSSetShaderResources(0, 1, &inputSRV);
              dc->CSSetUnorderedAccessViews(0, 1, &mBlurOutPutTexUAV, 0);
            
              UINT numGroupsX = (UINT)ceilf(m_Width / 256.0f);
              dc->Dispatch(numGroupsX, m_Height, 1);
             
              dc->CSSetShaderResources(1, 0, 0);
              dc->CSSetUnorderedAccessViews(1, 0, 0, 0);
              dc->CSSetShader(m_CSV, 0, 0);
              dc->CSSetShaderResources(0, 1, &mBlurOutPutTexSRV);
              dc->CSSetUnorderedAccessViews(0, 1, &inputUAV, 0);
              UINT numGroupY = (UINT)ceilf(m_Height / 256.0f);
              dc->Dispatch(m_Width, numGroupY, 1);
              dc->CSSetShaderResources(1, 0, 0);
              dc->CSSetUnorderedAccessViews(1, 0, 0, 0);
          }
          dc->CSSetShaderResources(1, 0, 0);
          dc->CSSetUnorderedAccessViews(1, 0, 0, 0);
          dc->CSSetShader(0, 0, 0);
      }
      If I don't call this function, everything is fine. (I rendered my scene to off screen redertarget and use this texture as quad texture. and render it to real rendertarget. it worked fined)
      That means there's problem in ComputeShader code.
      Every resource and view isn't null pointer, I checked it.
      all HRESULTs are S_OK.
       
      here my 2 shader codes
       
      this is CSH.hlsl
      static float gWeights[11] =
      {
          0.05f, 0.05f, 0.1f, 0.1f, 0.1f, 0.2f, 0.1f, 0.1f, 0.1f, 0.05f, 0.05f,
      };
      static const int gBlurRadius = 5;
      Texture2D gInput;
      RWTexture2D<float4> gOutput;
      #define N 256
      #define CacheSize (N + 2*gBlurRadius)
      groupshared float4 gCache[CacheSize];
      [numthreads(N, 1, 1)]
      void main(int3 groupThreadID : SV_GroupThreadID,
          int3 dispatchThreadID : SV_DispatchThreadID)
      {
          //
          // Fill local thread storage to reduce bandwidth.  To blur 
          // N pixels, we will need to load N + 2*BlurRadius pixels
          // due to the blur radius.
          //
          // This thread group runs N threads.  To get the extra 2*BlurRadius pixels, 
          // have 2*BlurRadius threads sample an extra pixel.
          if (groupThreadID.x < gBlurRadius)
          {
              // Clamp out of bound samples that occur at image borders.
              int x = max(dispatchThreadID.x - gBlurRadius, 0);
              gCache[groupThreadID.x] = gInput[int2(x, dispatchThreadID.y)];
          }
          if (groupThreadID.x >= N - gBlurRadius)
          {
              // Clamp out of bound samples that occur at image borders.
              int x = min(dispatchThreadID.x + gBlurRadius, gInput.Length.x - 1);
              gCache[groupThreadID.x + 2 * gBlurRadius] = gInput[int2(x, dispatchThreadID.y)];
          }
          // Clamp out of bound samples that occur at image borders.
          gCache[groupThreadID.x + gBlurRadius] = gInput[min(dispatchThreadID.xy, gInput.Length.xy - 1)];
          // Wait for all threads to finish.
          GroupMemoryBarrierWithGroupSync();
          //
          // Now blur each pixel.
          //
          float4 blurColor = float4(0, 0, 0, 0);
          [unroll]
          for (int i = -gBlurRadius; i <= gBlurRadius; ++i)
          {
              int k = groupThreadID.x + gBlurRadius + i;
              blurColor += gWeights[i + gBlurRadius] * gCache[k];
          }
          gOutput[dispatchThreadID.xy] = blurColor;
      }
      and this is CSV
       
      static float gWeights[11] =
      {
              0.05f, 0.05f, 0.1f, 0.1f, 0.1f, 0.2f, 0.1f, 0.1f, 0.1f, 0.05f, 0.05f,
      };
      static const int gBlurRadius = 5;
      Texture2D gInput;
      RWTexture2D<float4> gOutput;
      #define N 256
      #define CacheSize (256 + 2*5)
      groupshared float4 gCache[CacheSize];

      [numthreads(1, N, 1)]
      void main(int3 groupThreadID : SV_GroupThreadID,
          int3 dispatchThreadID : SV_DispatchThreadID)
      {
          //
          // Fill local thread storage to reduce bandwidth.  To blur 
          // N pixels, we will need to load N + 2*BlurRadius pixels
          // due to the blur radius.
          //
          // This thread group runs N threads.  To get the extra 2*BlurRadius pixels, 
          // have 2*BlurRadius threads sample an extra pixel.
          if (groupThreadID.y < gBlurRadius)
          {
              // Clamp out of bound samples that occur at image borders.
              int y = max(dispatchThreadID.y - gBlurRadius, 0);
              gCache[groupThreadID.y] = gInput[int2(dispatchThreadID.x, y)];
          }
          if (groupThreadID.y >= N - gBlurRadius)
          {
              // Clamp out of bound samples that occur at image borders.
              int y = min(dispatchThreadID.y + gBlurRadius, gInput.Length.y - 1);
              gCache[groupThreadID.y + 2 * gBlurRadius] = gInput[int2(dispatchThreadID.x, y)];
          }
          // Clamp out of bound samples that occur at image borders.
          gCache[groupThreadID.y + gBlurRadius] = gInput[min(dispatchThreadID.xy, gInput.Length.xy - 1)];

          // Wait for all threads to finish.
          GroupMemoryBarrierWithGroupSync();
          //
          // Now blur each pixel.
          //
          float4 blurColor = float4(0, 0, 0, 0);
          [unroll]
          for (int i = -gBlurRadius; i <= gBlurRadius; ++i)
          {
              int k = groupThreadID.y + gBlurRadius + i;
              blurColor += gWeights[i + gBlurRadius] * gCache[k];
          }
          gOutput[dispatchThreadID.xy] = blurColor;
      }
       
       
      sorry about poor english.
      plz help I'm really sad...
      I spend whole day for this...
      It doesn't work..
      feels bad man..
    • By Jemme
      Howdy
      Ive got a WPF level editor  and a C++ Directx DLL.

      Here are the main functions:
      public static class Engine { //DX dll //Init [DllImport("Win32Engine.dll", CallingConvention = CallingConvention.Cdecl)] public static extern void Initialize(IntPtr hwnd, int Width, int Height); //Messages / Input [DllImport("Win32Engine.dll", CallingConvention = CallingConvention.Cdecl)] public static extern void HandleMessage(IntPtr hwnd, int msg, int wParam, int lParam); //Load [DllImport("Win32Engine.dll", CallingConvention = CallingConvention.Cdecl)] public static extern void Load(); //Update [DllImport("Win32Engine.dll", CallingConvention = CallingConvention.Cdecl)] public static extern void Update(); //Draw [DllImport("Win32Engine.dll", CallingConvention = CallingConvention.Cdecl)] public static extern void Draw(); //Shutdown [DllImport("Win32Engine.dll", CallingConvention = CallingConvention.Cdecl)] public static extern void ShutDown(); } Okay so what is the proper way to get the window hosted inside a control and the pump the engine?
      At the moment i have it inside a (winfom) panel and use:
       
      protected override void OnSourceInitialized(EventArgs e) { base.OnSourceInitialized(e); HwndSource source = PresentationSource.FromVisual(this) as HwndSource; source.AddHook(WndProc); } private static IntPtr WndProc(IntPtr hwnd, int msg, IntPtr wParam, IntPtr lParam, ref bool handled) { Engine.HandleMessage(hwnd, msg, (int)wParam, (int)lParam); Engine.Update(); Engine.Draw(); return IntPtr.Zero; } But there's just a few problems:
      Messages come from everywhere not just the panel (due to using the main window) The input doesn't actually work It's super duper ugly code wise In terms of c++ side the normal enigne (non-editor ) uses this pump:
      while (msg.message != WM_QUIT) { while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) { TranslateMessage(&msg); DispatchMessage(&msg); //Input if (msg.message == WM_INPUT) { //Buffer size UINT size = 512; BYTE buffer[512]; GetRawInputData((HRAWINPUT)msg.lParam, RID_INPUT, (LPVOID)buffer, &size, sizeof(RAWINPUTHEADER)); RAWINPUT *raw = (RAWINPUT*)buffer; if (raw->header.dwType == RIM_TYPEKEYBOARD) { bool keyUp = raw->data.keyboard.Flags & RI_KEY_BREAK; USHORT keyCode = raw->data.keyboard.VKey; if (!keyUp) { Keyboard::SetKeyState(keyCode, true); } else { Keyboard::SetKeyState(keyCode, false); } } } } time->Update(); engine->Update(time->DeltaTime()); engine->Draw(); } Not the nicest loop but works for now for testing and things.

      Now the Editor versions code is:
       
      //Initalize enigne and all sub systems extern "C" { //Hwnd is a panel usually DLLExport void Initialize(int* hwnd, int Width, int Height) { engine = new Engine(); time = new Timer(); time->Update(); if (engine->Initialize(Width, Height,(WINHANDLE)hwnd)) { //WindowMessagePump(); } else { //return a fail? } } } extern "C" { DLLExport void HandleMessage(int* hwnd, int msg, int wParam, int lParam) { //Input if (msg == WM_INPUT) { //Buffer size UINT size = 512; BYTE buffer[512]; GetRawInputData((HRAWINPUT)lParam, RID_INPUT, (LPVOID)buffer, &size, sizeof(RAWINPUTHEADER)); RAWINPUT *raw = (RAWINPUT*)buffer; if (raw->header.dwType == RIM_TYPEKEYBOARD) { bool keyUp = raw->data.keyboard.Flags & RI_KEY_BREAK; USHORT keyCode = raw->data.keyboard.VKey; if (!keyUp) { Keyboard::SetKeyState(keyCode, true); } else { Keyboard::SetKeyState(keyCode, false); } } } } } //Load extern "C" { DLLExport void Load() { engine->Load(); } } //Update extern "C" { DLLExport void Update() { time->Update(); engine->Update(time->DeltaTime()); } } //Draw extern "C" { DLLExport void Draw() { engine->Draw(); } } //ShutDown Engine extern "C" { DLLExport void ShutDown() { engine->ShutDown(); delete time; delete engine; } }  
      Any advice of how to do this properly would be much apprcieated.
      p.s in my opinion the loop should kind of stay the same? but allow the wpf to psuh the message through some how and the loop within c++ calls the update and draw still so :
       
      //Gets message from C# somehow while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) { TranslateMessage(&msg); DispatchMessage(&msg); //Input } time->Update(); engine->Update(time->DeltaTime()); engine->Draw(); //returns back to c# ^ some how have that in c++ as the message pump called from wpf 
      Thanks
    • By data2
      I'm an experienced programmer specialized in Computer Graphics, mainly using Direct3D 9.0c, OpenGL and general algorithms. Currently, I am evaluating Direct2D as rendering technology for a professional application dealing with medical image data. As for rendering, it is a x64 desktop application in windowed mode (not fullscreen).
       
      Already with my very initial steps I struggle with a task I thought would be a no-brainer: Rendering a single-channel bitmap on screen.
       
      Running on a Windows 8.1 machine, I create an ID2D1DeviceContext with a Direct3D swap chain buffer surface as render target. The swap chain is created from a HWND and buffer format DXGI_FORMAT_B8G8R8A8_UNORM. Note: See also the code snippets at the end.
       
      Afterwards, I create a bitmap with pixel format DXGI_FORMAT_R8_UNORM and alpha mode D2d1_ALPHA_MODE_IGNORE. When calling DrawBitmap(...) on the device context, a debug break point is triggered with the debug message "D2d DEBUG ERROR - This operation is not compatible with the pixel format of the bitmap".
       
      I know that this output is quite clear. Also, when changing the pixel format to DXGI_FORMAT_R8G8B8A8_UNORM with DXGI_ALPHA_MODE_IGNORE everything works well and I see the bitmap rendered. However, I simply cannot believe that! Graphics cards support single-channel textures ever since - every 3D graphics application can use them without thinking twice. This goes without speaking.
       
      I tried to find anything here and at Google, without success. The only hint I could find was the MSDN Direct2D page with the (supported pixel formats). The documentation suggests - by not mentioning it - that DXGI_FORMAT_R8_UNORM is indeed not supported as bitmap format. I also find posts talking about alpha masks (using DXGI_FORMAT_A8_UNORM), but that's not what I'm after.

      What am I missing that I can't convince Direct2D to create and draw a grayscale bitmap? Or is it really true that Direct2D doesn't support drawing of R8 or R16 bitmaps??
       
      Any help is really appreciated as I don't know how to solve this. If I can't get this trivial basics to work, I think I'd have to stop digging deeper into Direct2D :-(.
       
      And here is the code snippets of relevance. Please note that they might not compile since I ported this on the fly from my C++/CLI code to plain C++. Also, I threw away all error checking and other noise:
       
      Device, Device Context and Swap Chain Creation (D3D and Direct2D):
      // Direct2D factory creation D2D1_FACTORY_OPTIONS options = {}; options.debugLevel = D2D1_DEBUG_LEVEL_INFORMATION; ID2D1Factory1* d2dFactory; D2D1CreateFactory(D2D1_FACTORY_TYPE_MULTI_THREADED, options, &d2dFactory); // Direct3D device creation const auto type = D3D_DRIVER_TYPE_HARDWARE; const auto flags = D3D11_CREATE_DEVICE_BGRA_SUPPORT; ID3D11Device* d3dDevice; D3D11CreateDevice(nullptr, type, nullptr, flags, nullptr, 0, D3D11_SDK_VERSION, &d3dDevice, nullptr, nullptr); // Direct2D device creation IDXGIDevice* dxgiDevice; d3dDevice->QueryInterface(__uuidof(IDXGIDevice), reinterpret_cast<void**>(&dxgiDevice)); ID2D1Device* d2dDevice; d2dFactory->CreateDevice(dxgiDevice, &d2dDevice); // Swap chain creation DXGI_SWAP_CHAIN_DESC1 desc = {}; desc.Format = DXGI_FORMAT_B8G8R8A8_UNORM; desc.SampleDesc.Count = 1; desc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; desc.BufferCount = 2; IDXGIAdapter* dxgiAdapter; dxgiDevice->GetAdapter(&dxgiAdapter); IDXGIFactory2* dxgiFactory; dxgiAdapter->GetParent(__uuidof(IDXGIFactory), reinterpret_cast<void **>(&dxgiFactory)); IDXGISwapChain1* swapChain; dxgiFactory->CreateSwapChainForHwnd(d3dDevice, hwnd, &swapChainDesc, nullptr, nullptr, &swapChain); // Direct2D device context creation const auto options = D2D1_DEVICE_CONTEXT_OPTIONS_NONE; ID2D1DeviceContext* deviceContext; d2dDevice->CreateDeviceContext(options, &deviceContext); // create render target bitmap from swap chain IDXGISurface* swapChainSurface; swapChain->GetBuffer(0, __uuidof(swapChainSurface), reinterpret_cast<void **>(&swapChainSurface)); D2D1_BITMAP_PROPERTIES1 bitmapProperties; bitmapProperties.dpiX = 0.0f; bitmapProperties.dpiY = 0.0f; bitmapProperties.bitmapOptions = D2D1_BITMAP_OPTIONS_TARGET | D2D1_BITMAP_OPTIONS_CANNOT_DRAW; bitmapProperties.pixelFormat.format = DXGI_FORMAT_B8G8R8A8_UNORM; bitmapProperties.pixelFormat.alphaMode = D2D1_ALPHA_MODE_IGNORE; bitmapProperties.colorContext = nullptr; ID2D1Bitmap1* swapChainBitmap = nullptr; deviceContext->CreateBitmapFromDxgiSurface(swapChainSurface, &bitmapProperties, &swapChainBitmap); // set swap chain bitmap as render target of D2D device context deviceContext->SetTarget(swapChainBitmap);  
      D2D single-channel Bitmap Creation:
      const D2D1_SIZE_U size = { 512, 512 }; const UINT32 pitch = 512; D2D1_BITMAP_PROPERTIES1 d2dProperties; ZeroMemory(&d2dProperties, sizeof(D2D1_BITMAP_PROPERTIES1)); d2dProperties.pixelFormat.alphaMode = D2D1_ALPHA_MODE_IGNORE; d2dProperties.pixelFormat.format = DXGI_FORMAT_R8_UNORM; char* sourceData = new char[512*512]; ID2D1Bitmap1* d2dBitmap; deviceContext->DeviceContextPointer->CreateBitmap(size, sourceData, pitch, d2dProperties, &d2dBitmap);  
      Bitmap drawing (FAILING):
      deviceContext->BeginDraw(); D2D1_COLOR_F d2dColor = {}; deviceContext->Clear(d2dColor); // THIS LINE FAILS WITH THE DEBUG BREAKPOINT IF SINGLE CHANNELED deviceContext->DrawBitmap(bitmap, nullptr, 1.0f, D2D1_INTERPOLATION_MODE_LINEAR, nullptr); swapChain->Present(1, 0); deviceContext->EndDraw();