Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!

We're also offering banner ads on our site from just $5! 1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Member Since 06 Sep 2004
Offline Last Active Nov 19 2014 06:13 PM

Topics I've Started

Tiled Resources & Large RWByteAddressBuffers

04 September 2014 - 12:35 PM

Hello DirectX community,
I have come into a problem within one of my compute kernels.
I am using tiled resources to map tiles from a tile pool into an RWByteAddressBuffer. Since the buffer is created with the tiled resource flag, its size can be humongous (greater than what can be byte addressable with a 32bit uint). And this is the exact problem I am having in my kernel.

#include "HR_Globals.h"

#define GRP_DIM 1024

cbuffer cbConstants : register(b0)
	unsigned int	gNumVoxelsPerLength;
	unsigned int	gNumTilesPerVoxel;
	float2			pad;

RWByteAddressBuffer	gRayGridOut	: register(u0); // flattened indexed buffer of the ray grid

/* This kernel inits the tiles' header of the ray grid.
[numthreads(GRP_DIM, 1, 1)]
InitRayGrid(uint3 Gid : SV_GroupID, uint3 DTid : SV_DispatchThreadID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex)
	unsigned int tile_index = ((Gid.x * GRP_DIM) + GI);
	unsigned int tile_head_node_offset = tile_index * BYTES_PER_TILE * gNumTilesPerVoxel;

	unsigned int total_size = 0;
	if (tile_head_node_offset < total_size)
		// 1st int of header represents the offset to the next node in the tile
		gRayGridOut.Store(tile_head_node_offset, -1);
		// 2nd int provides a counter for how many rays are in the node
		gRayGridOut.Store(tile_head_node_offset + 4, 0);

"total_size" returns a bad value because the HLSL function uses a 32bit uint.
"tile_head_node_offset" can also be out of range if the byte address is > 2^32 and there isn't any function for loading and storing from buffers that take a 64-bit type. From the documentation the only 64-bit type is the double type within which you can pack 2 uints.


Please advise on how I can get around this restriction.


Thank you in advance for you help.


- David

Frustum culling using a KD-tree

07 June 2013 - 04:22 PM

Hello all,


I just finished a ray-tracing class and have become more familiar with kd-trees.  We used a kd-tree to acquire the nearest neighbours for a photon mapping technique.  I have read from many places that a kd-tree can be used for frustum culling and am trying to understand how this is done.


Say I use a kd-tree implementation similar to the ANN library (http://www.cs.umd.edu/~mount/ANN/).  With such library, you provide it your points and you can query for the N nearest neighbours about a specific point or do a search for the neigbours within a radius at a specific point.  The thing is, how is such a structure useful for frustum culling?  The KD-tree stores points and can acquire nearest neighbours....  To do frustum culling, wouldn't you have to store AABB bounds with each node of the tree and do some sort of intersection with the frustum while traversing the tree structure?  Wouldn't that step away from the purpose of a kd-tree which is to efficiently acquire near neighbors for a given data set of k dimensions?


ANN uses "indices" to a vector of points.  So technically, I could somehow store AABB's in another vector with respective indices and pass the center point of each AABB to create the kd-tree.  But I still fail to see how that would help.... I'm assuming that the traversal logic would have to be much different than for looking for nearest neighbors.


I'm not sure if the above makes any sense, but in the end, I'd appreciate if someone could point me in the right direction to understand how a kd-tree can help with frustum culling. 


Thank you.

Questions about batching static geometry

27 March 2013 - 06:18 PM

Hello all,


I have come into a road block with my current project which goal is to test my rendering engine.  The problem occurs when attempting to draw a multitude of small models (less than 50 triangles).  Since each of those model have their own VB and IB, they are each drawn individually.  Of course this higher the draw calls and each of them have very little vertices to output, leaving me bounded by CPU speed.


The solution is to batch these suckers up, but I have some doubts about how to proceed.

Note that all the models are static.



My concerns:

- Since each model have their own transformation matrix, do I have to pre-transform each vertex before adding it to my batch vertex buffer? 

- How do I still take advantage of per-model frustum culling?

- How efficient is it to re-create the batch every frame post frustum cull checks?


The way I am currently thinking about doing this is like so:

- On startup, allocate enough memory for a static geometry batch VB & IB.

- Do the frustum cull checks on each model...

- Static models that can be batched (share same textures, materials, etc...) have each of their vertex transformed and added to the batch.

- Draw

- Flush batch, rebuild it for other models that share common textures, materials...

- Draw

and so on....


How efficient would it be do transform vertices and recreate the batches (multiple times) per frame?  Does anyone have any insights?


Thank you.

Humus' code (Help with Geometry Shader)

10 August 2012 - 01:48 PM

I am going over Hummus' deferred shading code. I'm a bit confused about the geometry shader during his deferred lighting pass.

So he basically calculates the lights' bounds in the geometry shader and passes constants ex, ey, and zw to do his calculations. I don't really understand the purpose of these values, more specifically the ratio he ends up sending over to the shader.

[source lang="cpp"]float ex = tanf(fov * 0.5f); float ey = ex * height / width; renderer->setShaderConstant1f("ex", 0.5f / ex); renderer->setShaderConstant1f("ey", 0.5f / ey); renderer->setShaderConstant2f("zw", projection.rows[2].zw());[/source]

Here is the box calculation function in the geometry shader:
[EDIT]: I have attached the shader file instead since for some reason a lot of the code was getting cut out using the code tags.

float ex = tanf(fov * 0.5f);
float ey = ex * height / width;

At this point ex is 1/2 the near plane's width and ey 1/2 the near plane's height right?

But why does he do:
renderer->setShaderConstant1f("ex", 0.5f / ex);
renderer->setShaderConstant1f("ey", 0.5f / ey);
Wouldn't that be 1/w and 1/h?

Also what does a, b, f represent exactly in this section:
// Compute extents in X
float Lxz = dot(lightPos.xz, lightPos.xz); // This gives me the quare of the magnitude
float iLxz = 1.0 / Lxz;
float a = radius * lightPos.x * iLxz;
float b = (radiusSqr - lightPos.z * lightPos.z) * iLxz;
float f = -b + a * a;

In the geometry shader, how does this work exactly?
zn = saturate(zw.x + zw.y / z0);
zf = saturate(zw.x + zw.y / z1);
zw.x and zw.y represent proj[2][2] and proj[2][3]. How do those elements of the matrix help figure out zn and zf? The 3rd row isn't part of the translation....

This is a proj mat in dx:
2*zn/w 0 0 0
0 2*zn/h 0 0
0 0 zf/(zf-zn) 1
0 0 zn*zf/(zn-zf) 0
So isn't proj[2][3] always 1?

And how does the whole ex, ey make sense?
float x0 = saturate(0.5 - N0 * ex);
float x1 = saturate(0.5 - N1 * ex);
float y0 = saturate(0.5 - N0 * ey);
float y1 = saturate(0.5 - N1 * ey);

Also, is there a reason for which he insists on using multiplications for his divisions? Is it more efficient to do x*0.5 than x/2.0?

Thank you for your help.

Need help organizing my code

19 April 2010 - 11:19 AM

Hello, I wasn't too sure where to put this so I opted for the beginner section. I am currently writing my own 3d game engine (for learning purposes) and I am having a hard time organizing my classes. Currently, my setup is like so (using DXUT): I have 3 main base classes: -Renderer base class -Input base class -Debugging base class Each have derived classes for more specific objects... For example, Renderer has a child camera which implements different cameras (1st person, 3rd person, etc...). Input has children for joystick, keyboard, mouse implementation. Now my question is how can I keep those classes completely separated? For example, the keyboard derived class will have methods used to move the camera, so it must have multiple inheritance from the Camera and Input class. This doesn't sit right to me as the Input and Renderer base classes should be isolated from each other. Is there a way to communicate between derived classes of different base parents without using multiple inheritance? This is hard to explain in words, but I hope someone understands and can point me in the right direction. Perhaps a callback mechanism would be ideal? Thank you. [Edited by - french_hustler on April 19, 2010 5:58:39 PM]