Jump to content

  • Log In with Google      Sign In   
  • Create Account


We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.

Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


Member Since 27 Jan 2014
Offline Last Active Dec 14 2014 01:18 PM

Posts I've Made

In Topic: Getting around non-connected vertex gaps in hardware tessellation displacemen...

06 November 2014 - 03:43 PM

Off the top of my head each individual face of your cube is tessellated and then displaced.

You need to ensure that the edge vertices are shared in each (subdivided) side-face or else these seams will occur since all vertices on the top face are displaced only along the up axis and all vertices of the front face are displaced only along the depth axis.

A simple solution is to displace along the vertex normals and ensure that whereever you have overlapping vertices (such as at the corners of a cube) you set the normal of all such vertices to the average of all "actual" vertex normals at that position. This will make the edges a bit more bulky but keep the faces connected.


My previous post in this thread (just above yours) describes how I solved this in a relatively simple way in more detail.

In Topic: Geometry shader-generated camera-aligned particles seemingly lacking Z writing

30 September 2014 - 02:31 PM

Hi, sorry for the late reply, I thought this post had long since faded.


The (used) array size (ElemSize) grows by a factor of two for each pass so it is simply 2 << pass.

The ArraySize value is just the allocated element count of the buffer to ensure that possible thread id's in excess of this won't write anything to the buffer (since there are 64 threads per group and only the number of groups can be set at dispatch time there will usually be a few threads too many to ensure there are enough to handle everything).

In Topic: What kind of performance to expect from real-time particle sorting?

25 September 2014 - 03:06 PM

Ah, yes, that makes better sense then (and proves I don't understand it at all).

Won't that mean that you cannot cram more than 1024 elements into a single thread group though, and as far as I know you cannot synchronize access between multiple thread groups in any other way than to make several dispatch calls?


Here's an explanation of the algorithm from a GPU point of view, but it's old, so their example implementation is a pixel shader, not a compute shader-

Yes, I've read through that however I wrote it off as probably unnecessarily complicated since he mentions making use of the rasterizer and vertex shader stages as optimizations in the final step (the idea was never demonstrated without those rather specific considerations as far as I saw so that made it quite hard to follow). I'll give it another read though.


I would assume that he nVidia and AMD sample collections would probably include a compute-shader implementation somewhere.

There is an Nvidia example I've found, but I think it has been tampered with because it won't even compile as-is. Not to mention that it lacks commenting and contains really funky bitwise manipulations on group shared memory that to me just doesn't seem to make sense. I suppose I can see where the shared memory comes from now that you mention synchronization though. I'll have another look at it. Also I didn't think about looking for CUDA examples so cheers for pointing that out.


As for the bucket idea - you could try a counting sort / radix sort, which can also be parallelized.

Ah I see, thanks for the names.

In Topic: What kind of performance to expect from real-time particle sorting?

24 September 2014 - 05:51 PM

Yeah, I imagined as much... that bitonic sort keeps popping up, yet it seems eerily lacking of proper descriptions (mostly it's things like

"for n = a..b where n *= 2


end for").


Anyway, from what I can gather, it seems to essentially be a kind of merge sort where your two sub-lists are organized so that one is decreasing from left to right, while the other is increasing. As such I imagine it might save one or two branching instructions per thread but the overall problem domain (required number of dispatch calls and elements to traverse, including the fact that the entire list will eventually have to be put together by a single thread in the last dispatch) seems to remain the same.

Is this a correct assesment of the algorithm? And if so, will it really yield significantly better performance?

I guess I'll try to implement it like I outlined above, but it feels like something is missing - I cannot see it being more than maybe some tenths or so faster than the standard merge sort if that is all there is to it.


I was also contemplating arranging data in equally-sized buckets so that the buckets have values in a certain range (so that all values in bucket 2 are greater or equal to any element in bucket 1 and so on) and the merge sort the buckets in parallel. Might this be efficient? It would most likely require a large single-thread operation at the beginning though.



As for that other clipping / alpha blending approach that does seem quite interesting, however I don't believe it to be applicable for my current situation as I have semi-transparent particles of different colours that in some situations will need to be visible through other particles (ie. they're not all white / some other consistent colour and not a solid colour in the middle like it sounds is what those approaches are for). Thanks for the links nonetheless smile.png

In Topic: Geometry shader-generated camera-aligned particles seemingly lacking Z writing

10 September 2014 - 11:12 AM

True, doing that gets rid of the clear colour forming a rectangle around the individual particles and everything looks fine on a per-frame basis.

Because of that showing an image doesn't help much; the individual frame capture images look just fine. However because of the way my particles are updates their draw order will vary from frame to frame and this is what causes issues; in one frame particle A is drawn before particle B and in the next frame particle B gets drawn before particle A. This causes quite noticible flickering when particles overlap. The problem wouldn't be very apparent if the particles used the same single colour, but as this is just a test to ensure I'll get proper results with multiple colours, all of my individual particles are blended with a random colour.

Fraps is still refusing to record anything besides a black screen with its FPS watermark on top so unfortunately I cannot produce a video of the issue either. I guess I could upload an executable if you like?


Edit: my blend states are

SrcBlend       = D3D11_BLEND_SRC_ALPHA

DstBlend       = D3D11_BLEND_INV_SRC_ALPHA

SrcAlphaBlend  = D3D11_BLEND_ONE

DstAlphaBlend  = D3D11_BLEND_ZERO

BlendOp        = D3D11_BLEND_OP_ADD

AlphaBlendOp   = D3D11_BLEND_OP_ADD


by the way, in case that would affect anything.