• Advertisement


  • Content count

  • Joined

  • Last visited

Community Reputation

479 Neutral

About french_hustler

  • Rank
  1. DX12 GPU support for D3D12_CROSS_NODE_SHARING_TIER

    https://forums.geforce.com/default/topic/820781/series-900-gtx-960-970-and-980-will-not-have-all-the-directx-12-features-/   Last post by "ManuelG" shows the Win10 DX caps viewer for various different GPUs. However flags > 11.3 are not available... so who knows if  D3D12_CROSS_NODE_SHARING_TIER is supported or not... I guess it's still too early.
  2. DX12 GPU support for D3D12_CROSS_NODE_SHARING_TIER

    Well... GPU manufacturers tend to be sketchy. "Fully" supports can mean supporting only the lower tiers. I'd say that a GPU that is D3D12_CROSS_NODE_SHARING_TIER_1_EMULATED may still claim DirectX 12 support. It'd be nice to get a clear list of supported tier levels for the D3D12_FEATURE_DATA_D3D12_OPTIONS structure (https://msdn.microsoft.com/en-us/library/windows/desktop/Dn770364(v=VS.85).aspx). I've made the mistake in the past of buying a GPU that claimed full support of DX11.2 only to get gypped when seeing the features I wanted were only supported at higher tier levels.
  3. Hi, I'm looking to start a new side project that'll leverage the new node sharing capabilities of DX12. I came across this in the documentation: https://msdn.microsoft.com/en-us/library/windows/desktop/dn914408(v=vs.85).aspx.  I tried to do some Google research to see which GPU architectures support D3D12_CROSS_NODE_SHARING_TIER_2 but came up empty handed.  Is this feature even supported by current GPUs?   Thanks.
  4. Weird Direct Compute performance issue

    Thank you for your reply. Each pass sorts 2-bits at a time. Each pass calls 3 dispatches: 1st creates the block sums buffer mentioned in the paper, 2nd does a prefix sum on it, 3rd scatters and sorts. So 12-bits keys need 6*3 = 18 dispatches, 14-bits keys need 21 dispatches.  Regardless the number of bits the keys have, they'll always use the same kernels for those 3 dispatches per pass.   Here are some more benchmarks.   Sorting 50,000 values: 2-bits integers:   ~0.18ms 4-bits integers:   ~0.37ms 6-bits integers:   ~0.55ms 8-bits integers:   ~0.74ms 10-bits integers: ~0.92ms 12-bits integers: ~1.09ms 14-bits integers: ~8.92ms 16-bits integers: ~10.00ms 32-bits integers: ~11.45ms   Sorting 10,000 values: 2-bits integers:   ~0.10ms 4-bits integers:   ~0.19ms 6-bits integers:   ~0.27ms 8-bits integers:   ~0.36ms 10-bits integers: ~0.45ms 12-bits integers: ~0.54ms 14-bits integers: ~8.08ms 16-bits integers: ~9.47ms 32-bits integers: ~11.46ms     If interested, I could provide source code (which is already on bitbucket), or upload the executable so that you can benchmark on your own computer.
  5. Hello,   I implemented a radix sort algorithm in Direct Compute (Dx 11) based off this paper: www.sci.utah.edu/~csilva/papers/cgf.pdf   I created a simple application that uses the algorithm and benchmarks its efficiency. I am, however, seeing very weird results. My GPU is a GTX 770.   Sorting 100,000 values: 2-bits integers:   ~0.38ms 4-bits integers:   ~0.75ms 6-bits integers:   ~1.12ms 8-bits integers:   ~1.48ms 10-bits integers: ~1.84ms 12-bits integers: ~2.21ms 14-bits integers: ~10.46ms 16-bits integers: ~11.12ms 32-bits integers: ~12.74ms   I'm having a hard time understanding the drastic increase when using more than 12-bits keys. The algorithm processes 2-bits per pass... so 12-bits requires 6-passes, 14 requires 7.  Can any-one point me in the right direction in figuring out why this would happen?   Thank you.
  6. Hello, I have created a hybrid renderer in DX11 that is able to shoot rays from screen-space into the scene. I am currently trying to implement soft shadows but am having problems understanding how to offset my shadow ray samples towards the area light.   Let's say I have a shadow ray at the origin (0, 0, 0) that is directed straight up at the center of an area light (0, 1, 0). Knowing this shadow ray and the radius of the area light, I want to create further ray samples that head towards the area light.   The way I need to do it is based on angles. All I have is the normalized ray directions going to the center of the area light and the "light size". Basically, the light size parameter controls how wide the hemisphere should be around the original shadow ray for the further samples. So a maximum light size of 180 would mean that an original shadow ray (0, 1, 0) could have ray samples going towards anywhere on the half upper hemisphere of the unit sphere.   So far, I have a light size that range from 0 to 180, Poisson Disc samples that range from (0.f, 0.f, 0.f) to (1.f, 1.f, 1.f) , and a normalized shadow ray direction towards the center of the area light. I scale the poisson disc sample from (-lightSize / 2) to (lightSize / 2). Based on these angles, how do I "jitter" the original vector?   I found Rodrigues' Formula, however it requires an axis of rotation. How do I get that axis? Do the angles I calculate actually correspond to Euler angles? Should I just make a rotation matrix from that? I'm a little confused and just need someone to point me in the correct direction.   Thank you.
  7. Tiled Resources & Large RWByteAddressBuffers

      Hey thanks for the reply. You're right about uints not being negative... that was silly of me. I wrote that because I was storing the values and reading them back on the CPU as integers. Regardless though, the returned values are incorrect once the tiled resource RWByteAddressBuffer is larger than what can be addressed with a 32bit uint. With tiled resources though, indexing into buffers remains the same. If you hit a sparse area though, the behavior will differ depending on the "tier level": supported by your GPU. In my case, I map a single "dummy" physical tile to any sparse tile of the buffer. Though inefficient, any time a store occurs to a sparse area, it will map to the dummy tile.   I'm pretty sure the code is correct since I tested the kernel on a smaller buffer. The problem is that the API allows you to create really huge tiled resources (since memory isn't allocated until you actually use UpdateTiles(...)), but doesn't let you access areas of byte addressable buffers that are beyond 2^32 bytes. The only solution I currently see is to either bind multiple buffers to the kernel and implement some sort of logic that would spill over to the next buffer once you reach areas that are addressable or rethink my algorithm as a whole :(.
  8. Hello DirectX community,   I have come into a problem within one of my compute kernels. I am using tiled resources to map tiles from a tile pool into an RWByteAddressBuffer. Since the buffer is created with the tiled resource flag, its size can be humongous (greater than what can be byte addressable with a 32bit uint). And this is the exact problem I am having in my kernel. #include "HR_Globals.h" #define GRP_DIM 1024 cbuffer cbConstants : register(b0) { unsigned int gNumVoxelsPerLength; unsigned int gNumTilesPerVoxel; float2 pad; }; RWByteAddressBuffer gRayGridOut : register(u0); // flattened indexed buffer of the ray grid /* This kernel inits the tiles' header of the ray grid. */ [numthreads(GRP_DIM, 1, 1)] void InitRayGrid(uint3 Gid : SV_GroupID, uint3 DTid : SV_DispatchThreadID, uint3 GTid : SV_GroupThreadID, uint GI : SV_GroupIndex) { unsigned int tile_index = ((Gid.x * GRP_DIM) + GI); unsigned int tile_head_node_offset = tile_index * BYTES_PER_TILE * gNumTilesPerVoxel; unsigned int total_size = 0; gRayGridOut.GetDimensions(total_size); if (tile_head_node_offset < total_size) { // 1st int of header represents the offset to the next node in the tile gRayGridOut.Store(tile_head_node_offset, -1); // 2nd int provides a counter for how many rays are in the node gRayGridOut.Store(tile_head_node_offset + 4, 0); } } "total_size" returns a bad value because the HLSL function uses a 32bit uint. "tile_head_node_offset" can also be out of range if the byte address is > 2^32 and there isn't any function for loading and storing from buffers that take a 64-bit type. From the documentation the only 64-bit type is the double type within which you can pack 2 uints.   Please advise on how I can get around this restriction.   Thank you in advance for you help.   - David
  9. Run DirectX 11 stream output without drawing

    Hey unbird, thanks for your reply.     You know, I recently bought a GTX 770 card that claims to support 11.2.  Scattered RW from any stage was one feature I really wanted on my new GPU.  Turns out NVIDIA does not support the full feature set. I do find it a bit misleading when the specifications fail to mention that it isn't a full support but "capable" of the 11.2 feature set.
  10. Run DirectX 11 stream output without drawing

    I have a further question about SO.  I was going to post a new topic but since this thread is "sort of" similar, I might as well ask here.   I basically want to voxelize my geometry. I was trying to by-pass the rasterization pipeline to do so as to avoid doing conservative rasterization by using the geometry shader. Instead I take all my objects vertices and indices, and for each object create an associated buffer of triangles.  I pass all these buffers into a kernel that creates my voxel grid datastructure.   Instead of pre-creating these buffers with triangles, I was thinking of rendering my scene regularly but streaming-out these triangles using an SO resource. What do I need to do to be able to use this resource in a compute kernel? Also is there a way to know how many elements are in an SO buffer?   Thank you.
  11. Frustum culling using a KD-tree

    Makes more sense... Thank you Bacterius.
  12. Frustum culling using a KD-tree

    Hey, thanks for the reply.  I am still confused .   Say I build the kd-tree from scratch.  I have N entities each with their own AABB bound.  Would I construct the kd-tree using the position of the entities?  Or should each point of the bounding boxes be used?  To me, a BVH makes much more sense for frustum culling as the nodes represent an AABB.  Same goes for structures like octrees for example.  Each node represents an "area" in space.  With a kd-tree a node represents a split in one dimension.
  13. Hello all,   I just finished a ray-tracing class and have become more familiar with kd-trees.  We used a kd-tree to acquire the nearest neighbours for a photon mapping technique.  I have read from many places that a kd-tree can be used for frustum culling and am trying to understand how this is done.   Say I use a kd-tree implementation similar to the ANN library (http://www.cs.umd.edu/~mount/ANN/).  With such library, you provide it your points and you can query for the N nearest neighbours about a specific point or do a search for the neigbours within a radius at a specific point.  The thing is, how is such a structure useful for frustum culling?  The KD-tree stores points and can acquire nearest neighbours....  To do frustum culling, wouldn't you have to store AABB bounds with each node of the tree and do some sort of intersection with the frustum while traversing the tree structure?  Wouldn't that step away from the purpose of a kd-tree which is to efficiently acquire near neighbors for a given data set of k dimensions?   ANN uses "indices" to a vector of points.  So technically, I could somehow store AABB's in another vector with respective indices and pass the center point of each AABB to create the kd-tree.  But I still fail to see how that would help.... I'm assuming that the traversal logic would have to be much different than for looking for nearest neighbors.   I'm not sure if the above makes any sense, but in the end, I'd appreciate if someone could point me in the right direction to understand how a kd-tree can help with frustum culling.    Thank you.
  14. rain

    That looks really awesome!
  15. What should I do?

    Sounds like you are interested in the actual game mechanics.  I don't think it directly deals with engine programming.  I feel like engine programming deals much more with the field of software architecture.  Game mechanics like the ones you've mentioned deal more with algorithms.   There are countless aspects to game programming that deal with computer science. Just thinking of the different specializations offered at my school, here are the ones I can think of: - Software architecture - Computer graphics - Computer vision - User interface programming - Algorithms - Sound engineering - AI / Machine learning    
  • Advertisement