• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Jason Smith
      While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
      #define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
      mov  qword ptr [rdx],rax
      which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.
       
    • By lubbe75
      As far as I understand there is no real random or noise function in HLSL. 
      I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
      Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious? 
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By NikiTo
      Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.
       
      if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
      Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

      MSN>
      discard: Do not output the result of the current pixel.
      clip: Discards the current pixel..
      <MSN

      As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

      I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

      (what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
    • By NikiTo
      I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
      Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

      Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!



       
  • Advertisement
  • Advertisement

DX12 Are there some memory limits to consider when writing a shader?

Recommended Posts

I remember I did safe branching in SIMD with the instructions of intel that take an extra vector for the decision making. But it is only good for few situations. I don't remember what situations exactly I used it in.

Share this post


Link to post
Share on other sites
Advertisement
20 minutes ago, l0calh05t said:

Many vector instruction sets offer masked operations nowadays. And with instructions like movemask you can make sure that only those branches that are in use are evaluated. So it really is more of a programming model thing than actual differences in hardware.

You could implement SIMT with this functionality but its still a level below SIMT.  It is a difference in hardware since the hardware on a GPU handles branch divergence in hardware AFAIK.  But then again I've never hacked GPU assembly so you never know, but I think I've read GPU's automatically handles this for you.

Share this post


Link to post
Share on other sites
On 22.9.2017 at 12:36 AM, NikiTo said:

Would it be a problem to create in HLSL ~50 uninitialized arrays of ~300000 cells each and then use them for my algorithm

On GPU you have only 32 kb of fast LDS memory, that.s not enough fot you, so you need to use global device memory. But if you launch e.g. 1000 thredgroups, each consisting 64 (can be 32 up to 1024) threads, you would need to allocate 1000 * 64 * 300000 cells id you need unique 300000 cells for each thread (beacause they all may run in parallel). That's the main limitation you need to think about.

Share this post


Link to post
Share on other sites
55 minutes ago, JoeJ said:

so you need to use global device memory.

I knew I forgot something in my reply... the original question was about shaders not the CPU implementation.

Share this post


Link to post
Share on other sites
Posted (edited)
On 9/23/2017 at 4:16 AM, Infinisearch said:

You could implement SIMT with this functionality but its still a level below SIMT.  It is a difference in hardware since the hardware on a GPU handles branch divergence in hardware AFAIK.  But then again I've never hacked GPU assembly so you never know, but I think I've read GPU's automatically handles this for you.

Sorry for bringing up an old topic but I realized I was in error at the time of writing this.  Even if you could lane mask in a SIMD architecture a vector register lets say r5 refers to a different register per lane in a SIMT architecture while in a SIMD architecture the r5 in an instruction would refer to the same register per lane.  So in an SIMT architecture a vector register would refer to the SIMT width number of different registers, while in SIMD it would all refer to the same register.  I suppose if you performed some sort of gather/scatter operation on a SIMD register it would be possible to simulate having a different register in each component of a SIMD register.  But that seems like an inefficient way of doing things.

Edited for clarity.

Edited by Infinisearch

Share this post


Link to post
Share on other sites

Sorry for bumping this again but I again think I'm wrong but this time in my above post.  Each element of a SIMD vector would be the same register for a different thread.  So vR5 + vR6 would work on the same registers of a n threads for n wide SIMD.  For some reason I got confused and thought that the layout of the registers had to be t1vr1,t1vr2,t1vr3... and forgot about the possibility of a organization of t1vr1,t2vr1,t3vr1.  My mistake... sorry for any confusion.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Advertisement