Sign in to follow this  
mrheisenberg

DX11 What is the point of using Catmull-Clark subdivision shaders?

Recommended Posts

I've been checking out demos of Catmull-Clark subdivisions implemented with DX11 tessellation,however I don't understand what exactly is the benefit of this technique.The visual effects are identical to the simpler,basic dynamic-LOD-tessellation shaders in the samples,yet the Catmull-Clark samples are a LOT heavier on performance.What am I missing?

Share this post


Link to post
Share on other sites

I'm not that familiar with the samples, but they're probably just implementing "linear" tesselation, where more triangles are added, but they don't curve at all to better match the curved surface that's roughly defined by their 'source' triangles. This is useful when you need extra vertices for something like displacement mapping, but not for smoothing out edges.


Catmull-Clark subD surfaces add curvature to the generated "sub triangles", e.g. on the Wikipedia page, you can see a cube bulge out into a sphere. The artist has control over how/where this "bulging" will occur.

Also, these surfaces and their behaviours are programmed into many 3D modelling packages, so if you implement them in the exact same way, then an artist working with Max/Maya/Blender/Softimage/etc can tweak their "bulge"/"smooth" parameters to get the kind of shape that they want, and then know it's actually going to appear that way in the engine too.

Edited by Hodgman

Share this post


Link to post
Share on other sites

actually, the artist have barely control over where bulging etc. happens, if you look for it on the net, you'll see that a lot of beginner artist wonder how they can control it. e.g. if you have a cylinder and you tessellate it with catmull-clark to make it rounder, you will end up with a capsule shape. some editing packages add extensions where artist can define hard borders, but most work-arounds for the original algorithm are to add two borders on edges you want to preserve to some degree (beveling in 3ds max), but you still get some smoothing at them.

but that's actually what makes catmull clark so nice and why artist who worked with the pure version, don't like the tools that extend it. if you have some nurb surfaces or bezier patches or ..., artist have to tweak them, and if you have an animated mesh, you have to tweak those control points in every keyframe, which makes it quite a lot of work. catmull clark meshes just work, they deliver mostly the expected result, they have no control points to skin with the mesh or to adjust. you tessellate an object, it looks nice, you apply a displacement texture and that's it. and while other algorithms usually get into trouble when you vary in the valence of your polys, catmull clark also works nicely in those special cases.

 

I also think you haven't seen a DX11 tessellation implementation of catmull clark, the tessellator hardware of dx11 cannot really be used for catmull clark as catmull clark is a recursive approach. there are ways to make it none-recursive, but the higher the tessellation factor, the more of the mesh you evaluate, it's not doable beyond some simple shapes. you've probably seen some approximation of catmull clark using e.g. bezier patches. but those are quite complex and error prone to implement and you need to run them on every animation step of a mesh, to re-create the approximation (at least that's what I've read in the papers when I was implementing it).

 

however, it's quite straight forward to implement catmull clark via compute. it's actually really nice for GPUs, working on every vertex independently etc.

http://twitpic.com/3ud6cx

:)

Share this post


Link to post
Share on other sites



actually, the artist have barely control over where bulging etc. happens

I've never modelled anything with catmull-clark surfaces -- is the tesselation shape dependent only on the vertex positions and normals, like phong tesselation?
Normals are ignored. The new points are build by averaging neighbouring polygon centers, edge centers, vertices... The different rules for subdivided corner points / edge-, poly-centers are simple, but because the process is recursive, it's difficult to accelerate.

I've done a lot of modeling with catmull clark and also made my own editor because i was not happy with crease options from commercial apps.
For modeling organic shapes catmull clark is the best option. With proper creases it's also a very good alternative to nurbs for things like cars etc., while still easier to understand.
Cons are: You need to avoid triangles and use regular quad grids whenever possible. A good model will end up with mostly quads, some 5 sided and a few 6 sided polygons.
Subdividing a typical triangulated mesh makes no sense - you need to have the original quadbased model to get good results.

The first subdivision step is special, it does the most important work and ends up with a mesh containing quads only.
For a good HW-acceleration it gives sense to do it with its own algorithm, maybe on CPU.
For following steps it could give sense to switch to a more hardware friendly method, like bezier patches.

If anyone has experience with practical HW-acceleration i would like to hear something about it too...
Note that this can be a very good thing, because if you do the skinning with the low res control mesh, you get MUCH better final high res skinning! This also saves some work, as you don't need to skin the subdivided stuff.

Skinning is where difference to other tesselation methods shows up most noticeably. Because the corner vertices get smoothed too, not just the surface around them. Maybe it's hard for a programmer to get the point why they are so good compared th other methods - but with skinning the difference in visual quality is really huge. Trust me :) Edited by JoeJ

Share this post


Link to post
Share on other sites

Hodgman

JoeJ pretty much hits the spot :)

just to emphasize it, while just positions are taken and it sounds like you loose a lot of informations (e.g. curvature that normals might express), it's actually the really good point of the algorithm, it is very very simple, you know what to expect, every implementation will lead to the same result (if you try to get some data from one modeling package to the other, tessellated stuff can be a horror, while catmull-clark basically is just an obj mesh, no extra features/data).

 

If anyone has experience with practical HW-acceleration i would like to hear something about it too...
Note that this can be a very good thing, because if you do the skinning with the low res control mesh, you get MUCH better final high res skinning! This also saves some work, as you don't need to skin the subdivided stuff.

you mean the tessellator on GPU? I've used it to implement an approximation described in this paper: http://faculty.cs.tamu.edu/schaefer/research/acc.pdf

 

as I said in my first post here, the sad thing comes with animation, I had to evaluate the skinned mesh every time, to generate those patches and to make it leak-free is quite an effort, nothing compared to the simplicity and beauty of catmull-clark tessellation.

 

 

Skinning is where difference to other tesselation methods shows up most noticeably. Because the corner vertices get smoothed too, not just the surface around them. Maybe it's hard for a programmer to get the point why they are so good compared th other methods - but with skinning the difference in visual quality is really huge. Trust me smile.png

I totally agree, that's why I've made the GPGU version of it, it works flawlessly with skinned characters, it's fast even in the cpu version (vectorized), you can go crazy to 1Mio vertices, then displace them (also with GPGPU) and it just works. :)

 

Hardware tessellation units are way faster, of course, but even without HW, you can get to a point where the polycount exceeds the pixelcount by far (while you still have normalmaps etc) and it's still running smoothly on average GPUs.

Share this post


Link to post
Share on other sites

Thx for summing up again, that gives a lot of sense to me now. I'm not really up to date with GPU stuff and missed the point that OGL/DX now have their own compute stuff and we can avoid to choose between Cuda or OpenCL :)

Share this post


Link to post
Share on other sites

Thx for summing up again, that gives a lot of sense to me now. I'm not really up to date with GPU stuff and missed the point that OGL/DX now have their own compute stuff and we can avoid to choose between Cuda or OpenCL smile.png

I've actually implemented it in OpenCL.

I've also written an rasterizer in OpenCL (for this renderer) rather than inter-op with OGL/DX ( tho, I have sadly no Catmull+software screenshot, just http://twitpic.com/40e85b ), but abusing the massive compute power for rasterization works actually quite nicely. you setup 1024 triangles into the local memory, then you can work on them in 8x8 pixel granularity, I think I got 10% to 20% of the theoretical peak hardware rasterization performance in a real world scenario. it wasn't even fully optimized, I just stopped when it was fast enough (was just like 2 or 3 days of work to make the rasterizer).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628378
    • Total Posts
      2982347
  • Similar Content

    • By joeblack
      Hi,
      im reading about specular aliasing because of mip maps, as far as i understood it, you need to compute fetched normal lenght and detect now its changed from unit length. I’m currently using BC5 normal maps, so i reconstruct z in shader and therefore my normals are normalized. Can i still somehow use antialiasing or its not needed? Thanks.
    • By 51mon
      I want to change the sampling behaviour to SampleLevel(coord, ddx(coord.y).xx, ddy(coord.y).xx). I was just wondering if it's possible without explicit shader code, e.g. with some flags or so?
    • By GalacticCrew
      Hello,
      I want to improve the performance of my game (engine) and some of your helped me to make a GPU Profiler. After creating the GPU Profiler, I started to measure the time my GPU needs per frame. I refined my GPU time measurements to find my bottleneck.
      Searching the bottleneck
      Rendering a small scene in an Idle state takes around 15.38 ms per frame. 13.54 ms (88.04%) are spent while rendering the scene, 1.57 ms (10.22%) are spent during the SwapChain.Present call (no VSync!) and the rest is spent on other tasks like rendering the UI. I further investigated the scene rendering, since it takes über 88% of my GPU frame rendering time.
      When rendering my scene, most of the time (80.97%) is spent rendering my models. The rest is spent to render the background/skybox, updating animation data, updating pixel shader constant buffer, etc. It wasn't really suprising that most of the time is spent for my models, so I further refined my measurements to find the actual bottleneck.
      In my example scene, I have five animated NPCs. When rendering these NPCs, most actions are almost for free. Setting the proper shaders in the input layout (0.11%), updating vertex shader constant buffers (0.32%), setting textures (0.24%) and setting vertex and index buffers (0.28%). However, the rest of the GPU time (99.05% !!) is spent in two function calls: DrawIndexed and DrawIndexedInstance.
      I searched this forum and the web for other articles and threads about these functions, but I haven't found a lot of useful information. I use SharpDX and .NET Framework 4.5 to develop my game (engine). The developer of SharpDX said, that "The method DrawIndexed in SharpDX is a direct call to DirectX" (Source). DirectX 11 is widely used and SharpDX is "only" a wrapper for DirectX functions, I assume the problem is in my code.
      How I render my scene
      When rendering my scene, I render one model after another. Each model has one or more parts and one or more positions. For example, a human model has parts like head, hands, legs, torso, etc. and may be placed in different locations (on the couch, on a street, ...). For static elements like furniture, houses, etc. I use instancing, because the positions never change at run-time. Dynamic models like humans and monster don't use instancing, because positions change over time.
      When rendering a model, I use this work-flow:
      Set vertex and pixel shaders, if they need to be updated (e.g. PBR shaders, simple shader, depth info shaders, ...) Set animation data as constant buffer in the vertex shader, if the model is animated Set generic vertex shader constant buffer (world matrix, etc.) Render all parts of the model. For each part: Set diffuse, normal, specular and emissive texture shader views Set vertex buffer Set index buffer Call DrawIndexedInstanced for instanced models and DrawIndexed models What's the problem
      After my GPU profiling, I know that over 99% of the rendering time for a single model is spent in the DrawIndexedInstanced and DrawIndexed function calls. But why do they take so long? Do I have to try to optimize my vertex or pixel shaders? I do not use other types of shaders at the moment. "Le Comte du Merde-fou" suggested in this post to merge regions of vertices to larger vertex buffers to reduce the number of Draw calls. While this makes sense to me, it does not explain why rendering my five (!) animated models takes that much GPU time. To make sure I don't analyse something I wrong, I made sure to not use the D3D11_CREATE_DEVICE_DEBUG flag and to run as Release version in Visual Studio as suggested by Hodgman in this forum thread.
      My engine does its job. Multi-texturing, animation, soft shadowing, instancing, etc. are all implemented, but I need to reduce the GPU load for performance reasons. Each frame takes less than 3ms CPU time by the way. So the problem is on the GPU side, I believe.
    • By noodleBowl
      I was wondering if someone could explain this to me
      I'm working on using the windows WIC apis to load in textures for DirectX 11. I see that sometimes the WIC Pixel Formats do not directly match a DXGI Format that is used in DirectX. I see that in cases like this the original WIC Pixel Format is converted into a WIC Pixel Format that does directly match a DXGI Format. And doing this conversion is easy, but I do not understand the reason behind 2 of the WIC Pixel Formats that are converted based on Microsoft's guide
      I was wondering if someone could tell me why Microsoft's guide on this topic says that GUID_WICPixelFormat40bppCMYKAlpha should be converted into GUID_WICPixelFormat64bppRGBA and why GUID_WICPixelFormat80bppCMYKAlpha should be converted into GUID_WICPixelFormat64bppRGBA
      In one case I would think that: 
      GUID_WICPixelFormat40bppCMYKAlpha would convert to GUID_WICPixelFormat32bppRGBA and that GUID_WICPixelFormat80bppCMYKAlpha would convert to GUID_WICPixelFormat64bppRGBA, because the black channel (k) values would get readded / "swallowed" into into the CMY channels
      In the second case I would think that:
      GUID_WICPixelFormat40bppCMYKAlpha would convert to GUID_WICPixelFormat64bppRGBA and that GUID_WICPixelFormat80bppCMYKAlpha would convert to GUID_WICPixelFormat128bppRGBA, because the black channel (k) bits would get redistributed amongst the remaining 4 channels (CYMA) and those "new bits" added to those channels would fit in the GUID_WICPixelFormat64bppRGBA and GUID_WICPixelFormat128bppRGBA formats. But also seeing as there is no GUID_WICPixelFormat128bppRGBA format this case is kind of null and void
      I basically do not understand why Microsoft says GUID_WICPixelFormat40bppCMYKAlpha and GUID_WICPixelFormat80bppCMYKAlpha should convert to GUID_WICPixelFormat64bppRGBA in the end
       
    • By DejayHextrix
      Hi, New here. 
      I need some help. My fiance and I like to play this mobile game online that goes by real time. Her and I are always working but when we have free time we like to play this game. We don't always got time throughout the day to Queue Buildings, troops, Upgrades....etc.... 
      I was told to look into DLL Injection and OpenGL/DirectX Hooking. Is this true? Is this what I need to learn? 
      How do I read the Android files, or modify the files, or get the in-game tags/variables for the game I want? 
      Any assistance on this would be most appreciated. I been everywhere and seems no one knows or is to lazy to help me out. It would be nice to have assistance for once. I don't know what I need to learn. 
      So links of topics I need to learn within the comment section would be SOOOOO.....Helpful. Anything to just get me started. 
      Thanks, 
      Dejay Hextrix 
  • Popular Now