
Advertisement

Content count
80 
Joined

Last visited
Community Reputation
479 NeutralAbout french_hustler

Rank
Member

DX12 GPU support for D3D12_CROSS_NODE_SHARING_TIER
french_hustler replied to french_hustler's topic in Graphics and GPU Programming
https://forums.geforce.com/default/topic/820781/series900gtx960970and980willnothaveallthedirectx12features/ Last post by "ManuelG" shows the Win10 DX caps viewer for various different GPUs. However flags > 11.3 are not available... so who knows if D3D12_CROSS_NODE_SHARING_TIER is supported or not... I guess it's still too early. 
DX12 GPU support for D3D12_CROSS_NODE_SHARING_TIER
french_hustler replied to french_hustler's topic in Graphics and GPU Programming
Well... GPU manufacturers tend to be sketchy. "Fully" supports can mean supporting only the lower tiers. I'd say that a GPU that is D3D12_CROSS_NODE_SHARING_TIER_1_EMULATED may still claim DirectX 12 support. It'd be nice to get a clear list of supported tier levels for the D3D12_FEATURE_DATA_D3D12_OPTIONS structure (https://msdn.microsoft.com/enus/library/windows/desktop/Dn770364(v=VS.85).aspx). I've made the mistake in the past of buying a GPU that claimed full support of DX11.2 only to get gypped when seeing the features I wanted were only supported at higher tier levels. 
DX12 GPU support for D3D12_CROSS_NODE_SHARING_TIER
french_hustler posted a topic in Graphics and GPU Programming
Hi, I'm looking to start a new side project that'll leverage the new node sharing capabilities of DX12. I came across this in the documentation: https://msdn.microsoft.com/enus/library/windows/desktop/dn914408(v=vs.85).aspx. I tried to do some Google research to see which GPU architectures support D3D12_CROSS_NODE_SHARING_TIER_2 but came up empty handed. Is this feature even supported by current GPUs? Thanks. 
Weird Direct Compute performance issue
french_hustler replied to french_hustler's topic in Graphics and GPU Programming
Thank you for your reply. Each pass sorts 2bits at a time. Each pass calls 3 dispatches: 1st creates the block sums buffer mentioned in the paper, 2nd does a prefix sum on it, 3rd scatters and sorts. So 12bits keys need 6*3 = 18 dispatches, 14bits keys need 21 dispatches. Regardless the number of bits the keys have, they'll always use the same kernels for those 3 dispatches per pass. Here are some more benchmarks. Sorting 50,000 values: 2bits integers: ~0.18ms 4bits integers: ~0.37ms 6bits integers: ~0.55ms 8bits integers: ~0.74ms 10bits integers: ~0.92ms 12bits integers: ~1.09ms 14bits integers: ~8.92ms 16bits integers: ~10.00ms 32bits integers: ~11.45ms Sorting 10,000 values: 2bits integers: ~0.10ms 4bits integers: ~0.19ms 6bits integers: ~0.27ms 8bits integers: ~0.36ms 10bits integers: ~0.45ms 12bits integers: ~0.54ms 14bits integers: ~8.08ms 16bits integers: ~9.47ms 32bits integers: ~11.46ms If interested, I could provide source code (which is already on bitbucket), or upload the executable so that you can benchmark on your own computer. 
Weird Direct Compute performance issue
french_hustler posted a topic in Graphics and GPU Programming
Hello, I implemented a radix sort algorithm in Direct Compute (Dx 11) based off this paper: www.sci.utah.edu/~csilva/papers/cgf.pdf I created a simple application that uses the algorithm and benchmarks its efficiency. I am, however, seeing very weird results. My GPU is a GTX 770. Sorting 100,000 values: 2bits integers: ~0.38ms 4bits integers: ~0.75ms 6bits integers: ~1.12ms 8bits integers: ~1.48ms 10bits integers: ~1.84ms 12bits integers: ~2.21ms 14bits integers: ~10.46ms 16bits integers: ~11.12ms 32bits integers: ~12.74ms I'm having a hard time understanding the drastic increase when using more than 12bits keys. The algorithm processes 2bits per pass... so 12bits requires 6passes, 14 requires 7. Can anyone point me in the right direction in figuring out why this would happen? Thank you. 
DX11 Need help understanding how to offset rays by specific angles
french_hustler posted a topic in Graphics and GPU Programming
Hello, I have created a hybrid renderer in DX11 that is able to shoot rays from screenspace into the scene. I am currently trying to implement soft shadows but am having problems understanding how to offset my shadow ray samples towards the area light. Let's say I have a shadow ray at the origin (0, 0, 0) that is directed straight up at the center of an area light (0, 1, 0). Knowing this shadow ray and the radius of the area light, I want to create further ray samples that head towards the area light. The way I need to do it is based on angles. All I have is the normalized ray directions going to the center of the area light and the "light size". Basically, the light size parameter controls how wide the hemisphere should be around the original shadow ray for the further samples. So a maximum light size of 180 would mean that an original shadow ray (0, 1, 0) could have ray samples going towards anywhere on the half upper hemisphere of the unit sphere. So far, I have a light size that range from 0 to 180, Poisson Disc samples that range from (0.f, 0.f, 0.f) to (1.f, 1.f, 1.f) , and a normalized shadow ray direction towards the center of the area light. I scale the poisson disc sample from (lightSize / 2) to (lightSize / 2). Based on these angles, how do I "jitter" the original vector? I found Rodrigues' Formula, however it requires an axis of rotation. How do I get that axis? Do the angles I calculate actually correspond to Euler angles? Should I just make a rotation matrix from that? I'm a little confused and just need someone to point me in the correct direction. Thank you. 
Tiled Resources & Large RWByteAddressBuffers
french_hustler replied to french_hustler's topic in Graphics and GPU Programming
Hey thanks for the reply. You're right about uints not being negative... that was silly of me. I wrote that because I was storing the values and reading them back on the CPU as integers. Regardless though, the returned values are incorrect once the tiled resource RWByteAddressBuffer is larger than what can be addressed with a 32bit uint. With tiled resources though, indexing into buffers remains the same. If you hit a sparse area though, the behavior will differ depending on the "tier level": supported by your GPU. In my case, I map a single "dummy" physical tile to any sparse tile of the buffer. Though inefficient, any time a store occurs to a sparse area, it will map to the dummy tile. I'm pretty sure the code is correct since I tested the kernel on a smaller buffer. The problem is that the API allows you to create really huge tiled resources (since memory isn't allocated until you actually use UpdateTiles(...)), but doesn't let you access areas of byte addressable buffers that are beyond 2^32 bytes. The only solution I currently see is to either bind multiple buffers to the kernel and implement some sort of logic that would spill over to the next buffer once you reach areas that are addressable or rethink my algorithm as a whole :(. 
Tiled Resources & Large RWByteAddressBuffers
french_hustler posted a topic in Graphics and GPU Programming
Hello DirectX community, I have come into a problem within one of my compute kernels. I am using tiled resources to map tiles from a tile pool into an RWByteAddressBuffer. Since the buffer is created with the tiled resource flag, its size can be humongous (greater than what can be byte addressable with a 32bit uint). And this is the exact problem I am having in my kernel. #include "HR_Globals.h" #define GRP_DIM 1024 cbuffer cbConstants : register(b0) { unsigned int gNumVoxelsPerLength; unsigned int gNumTilesPerVoxel; float2 pad; }; RWByteAddressBuffer gRayGridOut : register(u0); // flattened indexed buffer of the ray grid /* This kernel inits the tiles' header of the ray grid. */ [numthreads(GRP_DIM, 1, 1)] void InitRayGrid(uint3 Gid : SV_GroupID, uint3 DTid : SV_DispatchThreadID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex) { unsigned int tile_index = ((Gid.x * GRP_DIM) + GI); unsigned int tile_head_node_offset = tile_index * BYTES_PER_TILE * gNumTilesPerVoxel; unsigned int total_size = 0; gRayGridOut.GetDimensions(total_size); if (tile_head_node_offset < total_size) { // 1st int of header represents the offset to the next node in the tile gRayGridOut.Store(tile_head_node_offset, 1); // 2nd int provides a counter for how many rays are in the node gRayGridOut.Store(tile_head_node_offset + 4, 0); } } "total_size" returns a bad value because the HLSL function uses a 32bit uint. "tile_head_node_offset" can also be out of range if the byte address is > 2^32 and there isn't any function for loading and storing from buffers that take a 64bit type. From the documentation the only 64bit type is the double type within which you can pack 2 uints. Please advise on how I can get around this restriction. Thank you in advance for you help.  David 
Run DirectX 11 stream output without drawing
french_hustler replied to 3DModelerMan's topic in Graphics and GPU Programming
Hey unbird, thanks for your reply. You know, I recently bought a GTX 770 card that claims to support 11.2. Scattered RW from any stage was one feature I really wanted on my new GPU. Turns out NVIDIA does not support the full feature set. I do find it a bit misleading when the specifications fail to mention that it isn't a full support but "capable" of the 11.2 feature set. 
Run DirectX 11 stream output without drawing
french_hustler replied to 3DModelerMan's topic in Graphics and GPU Programming
I have a further question about SO. I was going to post a new topic but since this thread is "sort of" similar, I might as well ask here. I basically want to voxelize my geometry. I was trying to bypass the rasterization pipeline to do so as to avoid doing conservative rasterization by using the geometry shader. Instead I take all my objects vertices and indices, and for each object create an associated buffer of triangles. I pass all these buffers into a kernel that creates my voxel grid datastructure. Instead of precreating these buffers with triangles, I was thinking of rendering my scene regularly but streamingout these triangles using an SO resource. What do I need to do to be able to use this resource in a compute kernel? Also is there a way to know how many elements are in an SO buffer? Thank you. 
Frustum culling using a KDtree
french_hustler replied to french_hustler's topic in Graphics and GPU Programming
Makes more sense... Thank you Bacterius. 
Frustum culling using a KDtree
french_hustler replied to french_hustler's topic in Graphics and GPU Programming
Hey, thanks for the reply. I am still confused . Say I build the kdtree from scratch. I have N entities each with their own AABB bound. Would I construct the kdtree using the position of the entities? Or should each point of the bounding boxes be used? To me, a BVH makes much more sense for frustum culling as the nodes represent an AABB. Same goes for structures like octrees for example. Each node represents an "area" in space. With a kdtree a node represents a split in one dimension. 
Hello all, I just finished a raytracing class and have become more familiar with kdtrees. We used a kdtree to acquire the nearest neighbours for a photon mapping technique. I have read from many places that a kdtree can be used for frustum culling and am trying to understand how this is done. Say I use a kdtree implementation similar to the ANN library (http://www.cs.umd.edu/~mount/ANN/). With such library, you provide it your points and you can query for the N nearest neighbours about a specific point or do a search for the neigbours within a radius at a specific point. The thing is, how is such a structure useful for frustum culling? The KDtree stores points and can acquire nearest neighbours.... To do frustum culling, wouldn't you have to store AABB bounds with each node of the tree and do some sort of intersection with the frustum while traversing the tree structure? Wouldn't that step away from the purpose of a kdtree which is to efficiently acquire near neighbors for a given data set of k dimensions? ANN uses "indices" to a vector of points. So technically, I could somehow store AABB's in another vector with respective indices and pass the center point of each AABB to create the kdtree. But I still fail to see how that would help.... I'm assuming that the traversal logic would have to be much different than for looking for nearest neighbors. I'm not sure if the above makes any sense, but in the end, I'd appreciate if someone could point me in the right direction to understand how a kdtree can help with frustum culling. Thank you.

That looks really awesome!

What should I do?
french_hustler replied to workisnotfun's topic in General and Gameplay Programming
Sounds like you are interested in the actual game mechanics. I don't think it directly deals with engine programming. I feel like engine programming deals much more with the field of software architecture. Game mechanics like the ones you've mentioned deal more with algorithms. There are countless aspects to game programming that deal with computer science. Just thinking of the different specializations offered at my school, here are the ones I can think of:  Software architecture  Computer graphics  Computer vision  User interface programming  Algorithms  Sound engineering  AI / Machine learning

Advertisement