Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 06 Sep 2004
Offline Last Active Mar 21 2015 12:02 PM

Topics I've Started

Weird Direct Compute performance issue

28 February 2015 - 09:25 PM



I implemented a radix sort algorithm in Direct Compute (Dx 11) based off this paper: www.sci.utah.edu/~csilva/papers/cgf.pdf


I created a simple application that uses the algorithm and benchmarks its efficiency. I am, however, seeing very weird results.

My GPU is a GTX 770.


Sorting 100,000 values:

2-bits integers:   ~0.38ms

4-bits integers:   ~0.75ms

6-bits integers:   ~1.12ms

8-bits integers:   ~1.48ms

10-bits integers: ~1.84ms

12-bits integers: ~2.21ms

14-bits integers: ~10.46ms

16-bits integers: ~11.12ms

32-bits integers: ~12.74ms


I'm having a hard time understanding the drastic increase when using more than 12-bits keys. The algorithm processes 2-bits per pass... so 12-bits requires 6-passes, 14 requires 7.  Can any-one point me in the right direction in figuring out why this would happen?


Thank you.

Need help understanding how to offset rays by specific angles

31 December 2014 - 11:57 AM

Hello, I have created a hybrid renderer in DX11 that is able to shoot rays from screen-space into the scene. I am currently trying to implement soft shadows but am having problems understanding how to offset my shadow ray samples towards the area light.


Let's say I have a shadow ray at the origin (0, 0, 0) that is directed straight up at the center of an area light (0, 1, 0). Knowing this shadow ray and the radius of the area light, I want to create further ray samples that head towards the area light.


The way I need to do it is based on angles. All I have is the normalized ray directions going to the center of the area light and the "light size". Basically, the light size parameter controls how wide the hemisphere should be around the original shadow ray for the further samples. So a maximum light size of 180 would mean that an original shadow ray (0, 1, 0) could have ray samples going towards anywhere on the half upper hemisphere of the unit sphere.


So far, I have a light size that range from 0 to 180, Poisson Disc samples that range from (0.f, 0.f, 0.f) to (1.f, 1.f, 1.f) , and a normalized shadow ray direction towards the center of the area light. I scale the poisson disc sample from (-lightSize / 2) to (lightSize / 2). Based on these angles, how do I "jitter" the original vector?


I found Rodrigues' Formula, however it requires an axis of rotation. How do I get that axis? Do the angles I calculate actually correspond to Euler angles? Should I just make a rotation matrix from that? I'm a little confused and just need someone to point me in the correct direction.


Thank you.

Tiled Resources & Large RWByteAddressBuffers

04 September 2014 - 12:35 PM

Hello DirectX community,
I have come into a problem within one of my compute kernels.
I am using tiled resources to map tiles from a tile pool into an RWByteAddressBuffer. Since the buffer is created with the tiled resource flag, its size can be humongous (greater than what can be byte addressable with a 32bit uint). And this is the exact problem I am having in my kernel.

#include "HR_Globals.h"

#define GRP_DIM 1024

cbuffer cbConstants : register(b0)
	unsigned int	gNumVoxelsPerLength;
	unsigned int	gNumTilesPerVoxel;
	float2			pad;

RWByteAddressBuffer	gRayGridOut	: register(u0); // flattened indexed buffer of the ray grid

/* This kernel inits the tiles' header of the ray grid.
[numthreads(GRP_DIM, 1, 1)]
InitRayGrid(uint3 Gid : SV_GroupID, uint3 DTid : SV_DispatchThreadID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex)
	unsigned int tile_index = ((Gid.x * GRP_DIM) + GI);
	unsigned int tile_head_node_offset = tile_index * BYTES_PER_TILE * gNumTilesPerVoxel;

	unsigned int total_size = 0;
	if (tile_head_node_offset < total_size)
		// 1st int of header represents the offset to the next node in the tile
		gRayGridOut.Store(tile_head_node_offset, -1);
		// 2nd int provides a counter for how many rays are in the node
		gRayGridOut.Store(tile_head_node_offset + 4, 0);

"total_size" returns a bad value because the HLSL function uses a 32bit uint.
"tile_head_node_offset" can also be out of range if the byte address is > 2^32 and there isn't any function for loading and storing from buffers that take a 64-bit type. From the documentation the only 64-bit type is the double type within which you can pack 2 uints.


Please advise on how I can get around this restriction.


Thank you in advance for you help.


- David

Frustum culling using a KD-tree

07 June 2013 - 04:22 PM

Hello all,


I just finished a ray-tracing class and have become more familiar with kd-trees.  We used a kd-tree to acquire the nearest neighbours for a photon mapping technique.  I have read from many places that a kd-tree can be used for frustum culling and am trying to understand how this is done.


Say I use a kd-tree implementation similar to the ANN library (http://www.cs.umd.edu/~mount/ANN/).  With such library, you provide it your points and you can query for the N nearest neighbours about a specific point or do a search for the neigbours within a radius at a specific point.  The thing is, how is such a structure useful for frustum culling?  The KD-tree stores points and can acquire nearest neighbours....  To do frustum culling, wouldn't you have to store AABB bounds with each node of the tree and do some sort of intersection with the frustum while traversing the tree structure?  Wouldn't that step away from the purpose of a kd-tree which is to efficiently acquire near neighbors for a given data set of k dimensions?


ANN uses "indices" to a vector of points.  So technically, I could somehow store AABB's in another vector with respective indices and pass the center point of each AABB to create the kd-tree.  But I still fail to see how that would help.... I'm assuming that the traversal logic would have to be much different than for looking for nearest neighbors.


I'm not sure if the above makes any sense, but in the end, I'd appreciate if someone could point me in the right direction to understand how a kd-tree can help with frustum culling. 


Thank you.

Questions about batching static geometry

27 March 2013 - 06:18 PM

Hello all,


I have come into a road block with my current project which goal is to test my rendering engine.  The problem occurs when attempting to draw a multitude of small models (less than 50 triangles).  Since each of those model have their own VB and IB, they are each drawn individually.  Of course this higher the draw calls and each of them have very little vertices to output, leaving me bounded by CPU speed.


The solution is to batch these suckers up, but I have some doubts about how to proceed.

Note that all the models are static.



My concerns:

- Since each model have their own transformation matrix, do I have to pre-transform each vertex before adding it to my batch vertex buffer? 

- How do I still take advantage of per-model frustum culling?

- How efficient is it to re-create the batch every frame post frustum cull checks?


The way I am currently thinking about doing this is like so:

- On startup, allocate enough memory for a static geometry batch VB & IB.

- Do the frustum cull checks on each model...

- Static models that can be batched (share same textures, materials, etc...) have each of their vertex transformed and added to the batch.

- Draw

- Flush batch, rebuild it for other models that share common textures, materials...

- Draw

and so on....


How efficient would it be do transform vertices and recreate the batches (multiple times) per frame?  Does anyone have any insights?


Thank you.