• Advertisement

sebjf

Member
  • Content count

    24
  • Joined

  • Last visited

Community Reputation

187 Neutral

About sebjf

  • Rank
    Member

Personal Information

  • Interests
    Education
    Programming
  1. Thanks for your insights both! Nothing other than I'd hoped there was something more performant. This is in fact what I have written for now: a Ray Triangle intersection for CCD with Closest-Point-On-Triangle for DCD. Interesting system! It sounds similar to Qian et al's circumsphere. I was thinking of trying something like this. I suspect this as well. When considering collision detection techniques though I always mentally compare them against the ideal (and most correct) triangle based system. The problem is this is always hypothetical (and I am bad at estimating performance), so I have decided to build a triangle based system for bench-marking. If it turns out fast enough to use that's a big bonus, as there are some advantages of triangle meshes over sphere/point based representations, such as better friction emulation. It didn't occur to me before that what is really needed for a penalty force based response (which I think is the nicest approach because it automatically handles numerical precision error and cloth offset/arbitrary particle size) is a moving-sphere-triangle test, as opposed to a point triangle test, so I've been reading about these. Since there doesn't seem to be a consensus on the best, I decided to do some performance tests. I picked a few algorithms and put them in a test rig. They were refactored to remove most of the early returns to better emulate behaviour on a GPU. The test rig was configured to generate 100000 rays or so cast into a volumne containing a triangle, then across multiple repeats with multiple particle sizes their execution times were measured. The algorithms I tested are below, not all of them were 100% working as this is preliminary, but they were working enough I am comfortable using the measurements as a guideline. 1. Moller-Trumbore with a distance offset. Used as the benchmark. The intersection distance is adjusted so its always a set distance above the surface of the triangle. Not a proper implementation because the ray must still hit the triangle. 2. Flip Code Implementation I found here: http://www.flipcode.com/archives/Moving_Sphere_VS_Triangle_Collision.shtml 3. Fauerby Fauerby's intersection test. This didn't work as written, possibly porting mistake. 4. Eberly Eberly's intersection test (https://www.geometrictools.com/Documentation/IntersectionMovingSphereTriangle.pdf). This didn't work as written as some of the pseudocode functions are missing. 5. Geometric My own implementation. Uses geometric tests against primitives approximating the Minkowski sum of the radius and triangle, as opposed to finding roots like most of the others (though cylinder intersection still requires it) The results (in microseconds, per ray) Algo Avg (us) StdErr (us) _____________________________________ _______ _________ MollerTriangleCollisionDiagnostics 0.76056 0.0019454 GeometricTriangleCollisionDiagnostics 2.29 0.0094226 FlipCodeTriangleCollisionDiagnostics 2.8112 0.010876 FauerbyTriangleCollisionDiagnostics 1.4425 0.0035103 EberlyTriangleCollisionDiagnostics 6.3413 0.019937 Interesting to start with. I am glad the Geometric solution is competitive, because conceptually its simple and more hackable than the root finding methods I feel anyway (I can sort of imagine how DCD could be integrated with it already). I think its worth trying to fix Fauerby's, being only half as slow as a basic ray-triangle intersection despite doing considerably more. Eberly's I won't pursue, given that I want a GPGPU implementation eventually and its not a fair comparison after I've gutted all its optimisations!
  2. Hello, I would like to implement continuous point-triangle collision detection as part of a cloth simulation. What robust implementations of this are used in practice? For example it is simple to use a ray-triangle intersection with q (particle start) and q1 (particle end) defining the ray relative to the triangle. Though I find this is not robust as the simulation can force the particles into an implausible state, which once it occurs even by the slightest margin, is unrecoverable. Over-correcting the particles helps somewhat, but introduces oscillations. This could be combined with a closest-point-to-triangle test that runs if the ray-intersection fails, but this seems to me very expensive and is essentially running two collision detection algorithms in one. Is there not a better way to combine them? I have searched for a long time this but most resources are concerned with improved culling, or improved primitive tests. I've found only one paper that specifically addresses combining CCD & DCD for point-triangle tests (Wang, Y., Meng, Y., Du, P., & Zhao, J. (2014). Fast Collision Detection and Response for Vertex / Face Coupling Notations. Journal of Computational Information Systems, 10(16), 7101–7108. http://doi.org/10.12733/jcis11492), which uses an iterative search with a thickness parameter. Is there anything beyond what I have described for this? Sj EDIT: I am familiar with Bridson, Du, Tang, et al. and the vertex-face penalty force computation, but I don't see how this is fundamentally different from the ray+closest-point test, other than the cubic formulation allows both the triangle and vertex to change in time. Though Du et al's Dilated Continuous Contact Detection seems like it should do both, so maybe I need to read it again..
  3. Hi richardurich, I think thats it. I first put a DeviceMemoryBarrier() call between the InterlockedMin() and a re-read of the value. That didn't work. Though it may be to do with clip() (I recall reading about the effect this had on UAVs and will see if I can find the docs). Then, I removed the test entirely and wrote a second shader to draw the contents of the depth buffer - and that appears to be very stable. I will see if I can get it to work for a test in a single shader. Though I could probably refactor my project to just use a single uav, which would be more efficient. Thank you very much for your insights. I have been working at that for 2 days! Sj
  4. The colour values are always  <=0 or >=1, I make sure of that by setting it when I made the test data. (I also check it in RenderDoc). Though currenly the shader is written as  float c_norm = clamp(col.x, 0, 1) * 100; uint d_uint = (uint)c_norm; uint d_uint_original = 0; just to be sure.   I am using the colour values to make this minimal example as they are easy to control. In my real project the masked value is more complex, but as can be seen bug occurs even with something as simple as vertex colours.     Yes that's right - it has three possible values: 0, 1 (from the fragments) or 0xFFFFFFFF, which is the initialisation value. I have confirmed this is the case using the conditional as well. Thats why I suspect its a timing issue rather than, say, reading the wrong part of memory or not binding anything, even though I can't fully trust the debugger. This is meant to be the absolute simplest case I can come up with that still shows the issue.
  5. Hi samoth, Yes it does, in this narrow case anyway - usually I use asuint() or as you say multiply by a large number then cast. Above I did a direct cast because it was easy to see which triangle wrote each value when checking the memory in RenderDoc. (I've tried all sorts of casts and scales to see if that was causing this issue however and none have an effect.)
  6. @richardurich I originally tried to upload it to the forum but kept receiving errors. I've added the relevant code though, as actually as you say its not too long. I can't see anything I could change in it though - e.g. calling InterlockedMin like a method as one would with a RWByteAddress type just results in a compiler error.
  7. Hi,   I am working in Unity, trying to create depth-buffer-like functionality using atomics on a UAV in a pixel shader.   I find though that it does not behave as expected. It appears as if the InterlockedMin call is not behaving atomically.   I say appears, because all I can see is that the conditional based on the original memory value returned by InterlockedMin does not behave correctly. Whatever causes incorrect values to be returned from InterlockedMin also occurs in the frame debugger - Unity's and RenderDoc - so when debugging a pixel it changes from under me!   By changing this conditional I can see that InterlockedMin is not returning random data. It returns values that the memory feasibly would contain, just not what should be the minimum.     Here is a video showing what I mean: https://vid.me/tUP8 Here is a video showing the same behaviour for a single capture in RenderDoc: https://vid.me/4Fir   (In that video the pixel shader is trying to use InterlockedMin to draw only the fragments with the lowest vertex colours encountered so far, and discard all others.)   Things I have tried: RWByteAddressBuffer instead of RWStructuredBuffer Different creation flags for ComputeBuffer (though since its Unity the options are limited and opaque) Using a RenderTexture instead of a ComputeBuffer Using the globallycoherent prefix Clearing the buffer in the pixel shader then syncing with a DeviceMemoryBarrier() call Clearing the buffer in the pixel shader every other frame with a CPU set flag Using a different atomic (InterlockedMax()) Using a different slot and/or binding calls   Here is the minimum working example that created those videos: https://www.dropbox.com/s/3z2g85vcqw75d1a/Atomics%20Bug%20Minimum%20Working%20Example.zip?dl=0   I can't think of what else to try, I don't see how the issue could be anything other than the InterlockedMin call, and I don't see what else in my code could affect it...   Below is the relevant fragment shader: float4 frag (v2f i) : SV_Target { // sample the texture float4 col = i.colour; float c_norm = clamp(col.x, 0, 1);    //one triangle is <=0 and the other is >=1 uint d_uint = (uint)c_norm; uint d_uint_original = 0; uint2 upos = i.screenpos * screenparams; uint offset = (upos.y * screenparams.x) + upos.x; InterlockedMin(depth[offset], d_uint, d_uint_original); if (d_uint > d_uint_original) { clip(-1);    //we havent updated the depth buffer (or at least shouldnt have) so don't write the pixel } return col; } With the declaration of the buffer being: RWStructuredBuffer<uint> depth : register (u1); And here is how the buffer is being bound and used: // Use this for initialization void Start () {         int length = Camera.main.pixelWidth * Camera.main.pixelHeight;         depthbufferdata = new uint[length];         for(int i = 0; i < length; i++)         {             depthbufferdata[i] = 0xFFFFFFFF;         }         depthbuffer = new ComputeBuffer(length, sizeof(uint)); } // Update is called once per frame void OnRenderObject () {         depthbuffer.SetData(depthbufferdata); // clears the mask. in my actual project this is done with a compute shader.         Graphics.SetRandomWriteTarget(1, depthbuffer);                          material.SetVector("screenparams", Camera.main.pixelRect.size);         material.SetPass(0);         Graphics.DrawMeshNow(mesh, transform.localToWorldMatrix); } Sj
  8. Hi,   I am working in Unity on pieces of shader code to convert between a memory address and a coordinate in a uniform grid. To do this I use the Modulo operator, but find odd behaviour I cannot explain.   Below is a visualisation of the grid. It simply draws a point at each gridpoint. The locations for each vertex are computed from the offset into the fixed size uniform grid. I.e. The vector cell is computed from the vertex shader instance ID, and this is in turn converted into NDCs and rendered.   I start with the naive implementation:   uint3 GetFieldCell(uint id, float3 numcells) { uint3 cell; uint  layersize = numcells.x * numcells.y; cell.z = floor(id / layersize); uint layeroffset = id % layersize; cell.y = floor(layeroffset / numcells.x); cell.x = layeroffset % numcells.x; return cell; }   And see the following visual artefacts:   [attachment=35344:modulo_1.PNG]   I discover that this is due to the modulo operator. If I replace it with my own modulo operation:    uint3 GetFieldCell(uint id, float3 numcells) { uint3 cell; uint  layersize = numcells.x * numcells.y; cell.z = floor(id / layersize); uint layeroffset = id - (cell.z * layersize); cell.y = floor(layeroffset / numcells.x); cell.x = layeroffset - (cell.y * numcells.x); return cell; }   The artefact disappears:   [attachment=35345:modulo_3.PNG]   I debug one of the errant vertices in the previous shader with RenderDoc, and find that it is implemented using frc, rather than a true integer modulo op, leaving small components that work their way into the coordinate calculations:   [attachment=35346:modulo_2.PNG]   So I try again:   uint3 GetFieldCell(uint id, float3 numcells) { uint3 cell; uint  layersize = numcells.x * numcells.y; cell.z = floor(id / layersize); uint layeroffset = floor(id % layersize); cell.y = floor(layeroffset / numcells.x); cell.x = floor(layeroffset % numcells.x); return cell; }   And it wor...! Oh... ...That's unexpected:   [attachment=35347:modulo_4.PNG]   Can anyone explain this behaviour?   Is it small remainders of the multiplication of the frc result with the 'integer' as I suspect? If not, what else? If so, why does surrounding the result with floor() not work? (Its not optimised away, I've checked it in the debugger...)   Sj
  9. Thanks MJP!   Do you know what MSDN meant by that line in my original post? It says 'resource' specifically, rather than view - but then the whole thing is pretty ambiguous.   Sj
  10. Hi,   I have a compute shader which populates an append buffer, and a another shader that reads from it as a consume buffer.    Between these invocations, I would like to read every element in the resource in order to populate a second buffer.     I can think of a couple of ways to do it, such as using Consume() in my intermediate shader, and re-setting the count of the buffer afterwards. Or, binding the resource as a regular buffer and reading the whole thing.    There doesn't seem to be a way to set the count entirely on the GPU, and its not clear if the second method is supported (e.g. "Use these resources through their methods, these resources do not use resource variables.").   Is there any supported way to read an AppendStructuredBuffer without decreasing its count?   Thanks!     (PS. Cross-post at SO: http://stackoverflow.com/questions/41416272/set-counter-of-append-consume-buffer-on-gpu)
  11. What is a rig?

    Thanks for the replies! They are much appreciated and are all very helpful! I have much to learn about the animators workflow but its a lot better now I can see the use cases of the tools I am looking at.
  12. What is a rig?

    Hello,   I am looking at animating with Maya as I would like to understand the animators workflow, but am being confused by what I am finding online.   I always thought a rig was a basic skeleton + additional information such as joint constraints/IK solver parameters/etc. When I search for rigs for Maya though I am finding what look like complete characters - they even come with hair and multiple outfits. I am similarly confused by half the goals for this Kickstarter: https://www.kickstarter.com/projects/cgmonks/morpheus-rig-v20 (i.e. why such a tool would need to come with its own props?)   What is the purpose of these 'complete' rigs that are more like characters than rigs? Are the artists meant to use them as an asset in their game or render? Or are they just to be used by the animator, and then the modeller will take the skeleton and skin the actual character mesh?   What is the term for what I thought was a rig?   Sj          
  13. [size=3]Hi Graham,[/size] [size=3]First, sorry for the late reply, I am starting to wonder if I am completely misunderstanding the "Follow This Topic" button![/size] [size=3]To clarify, the first image is the 'detailed mesh' the second is the 'physical mesh'. The 'physical mesh' is literally the detailed mesh with overlapping polygons removed (and in this example, it was manual). This may require some explanation:[/size] [size=3]In my project, I am working on automatic mesh deformation whereby my algorithm fits one mesh over another. To do this, I reduce the target mesh to a simplified 'physical mesh' and check for collisions with a 'face cloud'. The 'face cloud' consists of the baked faces of every mesh making up the model(s) that the target mesh should deform to fit. (The target mesh when done will completely encompass the face cloud.)[/size] [size=3]For each point in the 'physical mesh', I project a ray and test for intersections with the face cloud, find the furthest one away then transform that control point to this position.[/size] [size=3]Before this is done, I 'skin' my detailed mesh to the 'physical mesh' - for each point in the detailed mesh (regardless of position/normal etc) I find the closest four points in the 'physical mesh', then weight the point to each of them (where the weight is the proportion of each points distance, to the sum of the distances); the result is, when the 'physical mesh' is deformed, each point in the 'detailed mesh' is deformed linearly with it.[/size] [size=3]The purpose of this is to preserve features such as overlapping edges, buttons, etc because with these, the normals of each point cannot be relied upon to determine which side of the surface the point exists on, hence the need for a control mesh. [/size] [size=3]What I am attempting to create in the 'physical mesh' is simply a single surface where all the points' normals accurately describe that surface.[/size] [size=3]So far, I do this by using the skinning data to calculate a 'roaming' centre of mass for each point, which is the average position of the point + all others that share the same bones. Any point whose normal is contrary to (Point Position - Centre Of Mass for that Point), is culled. (But is still deformed correctly because it is skinned to the surrounding points which are not deformed)[/size] [size=3]This whole setup is designed for user generated content, hence why I can't do what normal sensible people do and just have artists build a collision mesh in Max, it is also why I cannot make any assumptions about the target mesh*.[/size] [size=3]*Well, I can make some assumptions, for 1. I can assume it is skinned, and that the mesh it is deforming to fit is also skinned. Since I started using the skinning data the peformance (quality of results) has increased dramatically.[/size] [size=3]For more complex meshes though I still need a better solution, as it won't cull two points that sit very close, one outside the collision mesh, one inside (and hence when deformed the features are crushed as only one pulls its skinned verts out). [/size] [size=3]Your idea for ray tracing to find overlapping polys sounds very promising, I will look into this, Thanks![/size] [size=3]Seb[/size]
  14. In my project I am working on a 'subset' of cloth simulation in which I attempt to fit one mesh over another. My solution involves deforming a 'physical mesh' based on depth fields and using it to control the deformation of the complex, detailed mesh. I have seen impressive mesh optimization methods, but I don't want to optimize the mesh so much as extract part of it. What I want is a way to approximate the 'inside surface' of a mesh, since in the 'real world' this is what would interact physically with mesh being deformed with. Take the images below; the second mesh contains no overlapping polygons - the lapels, shoulder straps and buttons are gone - it is a [i]single surface[/i] consisting of the points closest to the character. [attachment=8440:jck.jpg] (Checking for and removing overlapping polygons would be one way I suppose, but how to decide which are the 'outer' and which are the 'inner' bearing in mind the normals of the semantically inside polys won't necessarily emanate from the geometric centre of the mesh) Does anyone know of an existing implementation that does something like this?
  15. Hi TheUnbeliever, Thank you! I don't know how I read that as d1, d2 and d3 the first time round. (I still think they are very obscurely named variables!) It is somewhat clearer what is happening. As I see it now, when the sum of the distances is calculated each distance is actually weighted by the angle of that point to the 'main point'. This would be so that when a vertex lies close to the vector between two control points, the third points influence is reduced, as the technical distance may be close but the practical deformation is controlled by the control points at either side right?
  • Advertisement