Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 06 Sep 2004
Offline Last Active Mar 21 2015 12:02 PM

Posts I've Made

In Topic: Weird Direct Compute performance issue

01 March 2015 - 12:42 PM

Thank you for your reply.

Each pass sorts 2-bits at a time. Each pass calls 3 dispatches: 1st creates the block sums buffer mentioned in the paper, 2nd does a prefix sum on it, 3rd scatters and sorts. So 12-bits keys need 6*3 = 18 dispatches, 14-bits keys need 21 dispatches.  Regardless the number of bits the keys have, they'll always use the same kernels for those 3 dispatches per pass.


Here are some more benchmarks.


Sorting 50,000 values:

2-bits integers:   ~0.18ms

4-bits integers:   ~0.37ms

6-bits integers:   ~0.55ms

8-bits integers:   ~0.74ms

10-bits integers: ~0.92ms

12-bits integers: ~1.09ms

14-bits integers: ~8.92ms

16-bits integers: ~10.00ms

32-bits integers: ~11.45ms


Sorting 10,000 values:

2-bits integers:   ~0.10ms

4-bits integers:   ~0.19ms

6-bits integers:   ~0.27ms

8-bits integers:   ~0.36ms

10-bits integers: ~0.45ms

12-bits integers: ~0.54ms

14-bits integers: ~8.08ms

16-bits integers: ~9.47ms

32-bits integers: ~11.46ms



If interested, I could provide source code (which is already on bitbucket), or upload the executable so that you can benchmark on your own computer.

In Topic: Tiled Resources & Large RWByteAddressBuffers

05 September 2014 - 10:32 AM

I haven't used tiled resources yet, but I think you have to perform calculations that depend on the tiled resources' address mapping model on the CPU, and send the results to your shader, treating them as relative to the start of the region of active tiles. Also, I think that the RWByteAddressBuffer you access in your shader is not actually the whole buffer, but only the active tiles, so the RWByteAddressBuffer's address 0 actually corresponds to the tiled resource's pDestTileRegionStartCoordinate value set with ID3D11DeviceContext2::UpdateTiles, and RWByteAddressBuffer::GetDimensions returns only the size of the active tiles...


Also, there's no such thing as a negative uint. smile.png


Hey thanks for the reply. You're right about uints not being negative... that was silly of me. I wrote that because I was storing the values and reading them back on the CPU as integers. Regardless though, the returned values are incorrect once the tiled resource RWByteAddressBuffer is larger than what can be addressed with a 32bit uint.

With tiled resources though, indexing into buffers remains the same. If you hit a sparse area though, the behavior will differ depending on the "tier level": supported by your GPU. In my case, I map a single "dummy" physical tile to any sparse tile of the buffer. Though inefficient, any time a store occurs to a sparse area, it will map to the dummy tile.


I'm pretty sure the code is correct since I tested the kernel on a smaller buffer. The problem is that the API allows you to create really huge tiled resources (since memory isn't allocated until you actually use UpdateTiles(...)), but doesn't let you access areas of byte addressable buffers that are beyond 2^32 bytes. The only solution I currently see is to either bind multiple buffers to the kernel and implement some sort of logic that would spill over to the next buffer once you reach areas that are addressable or rethink my algorithm as a whole :(.

In Topic: Run DirectX 11 stream output without drawing

07 June 2014 - 04:03 PM

Hey unbird, thanks for your reply.



Edit: Is this for vanilla DX11 ? Because 11.1 allows writing to UAVs from every shader stage. You wouldn't even need stream out functionality.


You know, I recently bought a GTX 770 card that claims to support 11.2.  Scattered RW from any stage was one feature I really wanted on my new GPU.  Turns out NVIDIA does not support the full feature set. I do find it a bit misleading when the specifications fail to mention that it isn't a full support but "capable" of the 11.2 feature set.

In Topic: Run DirectX 11 stream output without drawing

06 June 2014 - 04:17 PM

I have a further question about SO. 

I was going to post a new topic but since this thread is "sort of" similar, I might as well ask here.


I basically want to voxelize my geometry. I was trying to by-pass the rasterization pipeline to do so as to avoid doing conservative rasterization by using the geometry shader. Instead I take all my objects vertices and indices, and for each object create an associated buffer of triangles.  I pass all these buffers into a kernel that creates my voxel grid datastructure.


Instead of pre-creating these buffers with triangles, I was thinking of rendering my scene regularly but streaming-out these triangles using an SO resource. What do I need to do to be able to use this resource in a compute kernel? Also is there a way to know how many elements are in an SO buffer?


Thank you.

In Topic: Frustum culling using a KD-tree

14 June 2013 - 03:04 PM

Makes more sense... Thank you Bacterius.