Sign in to follow this  
mrheisenberg

DX11 What is the point of using Catmull-Clark subdivision shaders?

Recommended Posts

I've been checking out demos of Catmull-Clark subdivisions implemented with DX11 tessellation,however I don't understand what exactly is the benefit of this technique.The visual effects are identical to the simpler,basic dynamic-LOD-tessellation shaders in the samples,yet the Catmull-Clark samples are a LOT heavier on performance.What am I missing?

Share this post


Link to post
Share on other sites

I'm not that familiar with the samples, but they're probably just implementing "linear" tesselation, where more triangles are added, but they don't curve at all to better match the curved surface that's roughly defined by their 'source' triangles. This is useful when you need extra vertices for something like displacement mapping, but not for smoothing out edges.


Catmull-Clark subD surfaces add curvature to the generated "sub triangles", e.g. on the Wikipedia page, you can see a cube bulge out into a sphere. The artist has control over how/where this "bulging" will occur.

Also, these surfaces and their behaviours are programmed into many 3D modelling packages, so if you implement them in the exact same way, then an artist working with Max/Maya/Blender/Softimage/etc can tweak their "bulge"/"smooth" parameters to get the kind of shape that they want, and then know it's actually going to appear that way in the engine too.

Edited by Hodgman

Share this post


Link to post
Share on other sites

actually, the artist have barely control over where bulging etc. happens, if you look for it on the net, you'll see that a lot of beginner artist wonder how they can control it. e.g. if you have a cylinder and you tessellate it with catmull-clark to make it rounder, you will end up with a capsule shape. some editing packages add extensions where artist can define hard borders, but most work-arounds for the original algorithm are to add two borders on edges you want to preserve to some degree (beveling in 3ds max), but you still get some smoothing at them.

but that's actually what makes catmull clark so nice and why artist who worked with the pure version, don't like the tools that extend it. if you have some nurb surfaces or bezier patches or ..., artist have to tweak them, and if you have an animated mesh, you have to tweak those control points in every keyframe, which makes it quite a lot of work. catmull clark meshes just work, they deliver mostly the expected result, they have no control points to skin with the mesh or to adjust. you tessellate an object, it looks nice, you apply a displacement texture and that's it. and while other algorithms usually get into trouble when you vary in the valence of your polys, catmull clark also works nicely in those special cases.

 

I also think you haven't seen a DX11 tessellation implementation of catmull clark, the tessellator hardware of dx11 cannot really be used for catmull clark as catmull clark is a recursive approach. there are ways to make it none-recursive, but the higher the tessellation factor, the more of the mesh you evaluate, it's not doable beyond some simple shapes. you've probably seen some approximation of catmull clark using e.g. bezier patches. but those are quite complex and error prone to implement and you need to run them on every animation step of a mesh, to re-create the approximation (at least that's what I've read in the papers when I was implementing it).

 

however, it's quite straight forward to implement catmull clark via compute. it's actually really nice for GPUs, working on every vertex independently etc.

http://twitpic.com/3ud6cx

:)

Share this post


Link to post
Share on other sites



actually, the artist have barely control over where bulging etc. happens

I've never modelled anything with catmull-clark surfaces -- is the tesselation shape dependent only on the vertex positions and normals, like phong tesselation?
Normals are ignored. The new points are build by averaging neighbouring polygon centers, edge centers, vertices... The different rules for subdivided corner points / edge-, poly-centers are simple, but because the process is recursive, it's difficult to accelerate.

I've done a lot of modeling with catmull clark and also made my own editor because i was not happy with crease options from commercial apps.
For modeling organic shapes catmull clark is the best option. With proper creases it's also a very good alternative to nurbs for things like cars etc., while still easier to understand.
Cons are: You need to avoid triangles and use regular quad grids whenever possible. A good model will end up with mostly quads, some 5 sided and a few 6 sided polygons.
Subdividing a typical triangulated mesh makes no sense - you need to have the original quadbased model to get good results.

The first subdivision step is special, it does the most important work and ends up with a mesh containing quads only.
For a good HW-acceleration it gives sense to do it with its own algorithm, maybe on CPU.
For following steps it could give sense to switch to a more hardware friendly method, like bezier patches.

If anyone has experience with practical HW-acceleration i would like to hear something about it too...
Note that this can be a very good thing, because if you do the skinning with the low res control mesh, you get MUCH better final high res skinning! This also saves some work, as you don't need to skin the subdivided stuff.

Skinning is where difference to other tesselation methods shows up most noticeably. Because the corner vertices get smoothed too, not just the surface around them. Maybe it's hard for a programmer to get the point why they are so good compared th other methods - but with skinning the difference in visual quality is really huge. Trust me :) Edited by JoeJ

Share this post


Link to post
Share on other sites

Hodgman

JoeJ pretty much hits the spot :)

just to emphasize it, while just positions are taken and it sounds like you loose a lot of informations (e.g. curvature that normals might express), it's actually the really good point of the algorithm, it is very very simple, you know what to expect, every implementation will lead to the same result (if you try to get some data from one modeling package to the other, tessellated stuff can be a horror, while catmull-clark basically is just an obj mesh, no extra features/data).

 

If anyone has experience with practical HW-acceleration i would like to hear something about it too...
Note that this can be a very good thing, because if you do the skinning with the low res control mesh, you get MUCH better final high res skinning! This also saves some work, as you don't need to skin the subdivided stuff.

you mean the tessellator on GPU? I've used it to implement an approximation described in this paper: http://faculty.cs.tamu.edu/schaefer/research/acc.pdf

 

as I said in my first post here, the sad thing comes with animation, I had to evaluate the skinned mesh every time, to generate those patches and to make it leak-free is quite an effort, nothing compared to the simplicity and beauty of catmull-clark tessellation.

 

 

Skinning is where difference to other tesselation methods shows up most noticeably. Because the corner vertices get smoothed too, not just the surface around them. Maybe it's hard for a programmer to get the point why they are so good compared th other methods - but with skinning the difference in visual quality is really huge. Trust me smile.png

I totally agree, that's why I've made the GPGU version of it, it works flawlessly with skinned characters, it's fast even in the cpu version (vectorized), you can go crazy to 1Mio vertices, then displace them (also with GPGPU) and it just works. :)

 

Hardware tessellation units are way faster, of course, but even without HW, you can get to a point where the polycount exceeds the pixelcount by far (while you still have normalmaps etc) and it's still running smoothly on average GPUs.

Share this post


Link to post
Share on other sites

Thx for summing up again, that gives a lot of sense to me now. I'm not really up to date with GPU stuff and missed the point that OGL/DX now have their own compute stuff and we can avoid to choose between Cuda or OpenCL :)

Share this post


Link to post
Share on other sites

Thx for summing up again, that gives a lot of sense to me now. I'm not really up to date with GPU stuff and missed the point that OGL/DX now have their own compute stuff and we can avoid to choose between Cuda or OpenCL smile.png

I've actually implemented it in OpenCL.

I've also written an rasterizer in OpenCL (for this renderer) rather than inter-op with OGL/DX ( tho, I have sadly no Catmull+software screenshot, just http://twitpic.com/40e85b ), but abusing the massive compute power for rasterization works actually quite nicely. you setup 1024 triangles into the local memory, then you can work on them in 8x8 pixel granularity, I think I got 10% to 20% of the theoretical peak hardware rasterization performance in a real world scenario. it wasn't even fully optimized, I just stopped when it was fast enough (was just like 2 or 3 days of work to make the rasterizer).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628326
    • Total Posts
      2982086
  • Similar Content

    • By GalacticCrew
      In some situations, my game starts to "lag" on older computers. I wanted to search for bottlenecks and optimize my game by searching for flaws in the shaders and the layer between CPU and GPU. My first step was to measure the time my render function needs to solve its tasks. Every second I wrote the accumulated times of each task into my console window. Each second it takes around
      170ms to call render functions for all models (including settings shader resources, updating constant buffers, drawing all indexed and non-indexed vertices, etc.) 40ms to render the UI 790ms to call SwapChain.Present <1ms to do the rest (updating structures, etc.) In my Swap Chain description I set a frame rate of 60 Hz, if its supported by the computer. It made sense for me that the Present function waits some time until it starts the next frame. However, I wanted to check, if this might be a problem for me. After a web search I found articles like this one, which states 
      My drivers are up-to-date so that's no issue. I installed Microsoft's PIX, but I was unable to use it. I could configure my game for x64, but PIX is not able to process DirectX 11.. After getting only error messages, I installed NVIDIA's NSight. After adjusting my game and installing all components, I couldn't get a proper result, but my game freezes after a new frames. I haven't figured out why. There is no exception, error message and other debug mechanisms like log messages and break points tell me the game freezes at the end of the render function after a few frames. So, I looked for another profiling tool and found Jeremy's GPUProfiler. However, the information returned by this tool are too basic to get an in-depth knowledge about my performance issues.
      Can anyone recommend a GPU Profiler or any other tool that might help me to find bottlenecks in my game and or that is able to indicate performance problems in my shaders? My custom graphics engine can handle subjects like multi-texturing, instancing, soft shadowing, animation, etc. However, I am pretty sure, there are things I can optimize!
      I am using SharpDX to develop a game (engine) based on DirectX 11 with .NET Framework 4.5. My graphics cards is from NVIDIA and my processor is made by Intel.
    • By GreenGodDiary
      SOLVED: I had written 
      Dispatch(32, 24, 0) instead of
      Dispatch(32, 24, 1)  
       
      I'm attempting to implement some basic post-processing in my "engine" and the HLSL part of the Compute Shader and such I think I've understood, however I'm at a loss at how to actually get/use it's output for rendering to the screen.
      Assume I'm doing something to a UAV in my CS:
      RWTexture2D<float4> InputOutputMap : register(u0); I want that texture to essentially "be" the backbuffer.
       
      I'm pretty certain I'm doing something wrong when I create the views (what I think I'm doing is having the backbuffer be bound as render target aswell as UAV and then using it in my CS):
       
      DXGI_SWAP_CHAIN_DESC scd; ZeroMemory(&scd, sizeof(DXGI_SWAP_CHAIN_DESC)); scd.BufferCount = 1; scd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; scd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_SHADER_INPUT | DXGI_USAGE_UNORDERED_ACCESS; scd.OutputWindow = wndHandle; scd.SampleDesc.Count = 1; scd.Windowed = TRUE; HRESULT hr = D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, NULL, NULL, NULL, D3D11_SDK_VERSION, &scd, &gSwapChain, &gDevice, NULL, &gDeviceContext); // get the address of the back buffer ID3D11Texture2D* pBackBuffer = nullptr; gSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (LPVOID*)&pBackBuffer); // use the back buffer address to create the render target gDevice->CreateRenderTargetView(pBackBuffer, NULL, &gBackbufferRTV); // set the render target as the back buffer CreateDepthStencilBuffer(); gDeviceContext->OMSetRenderTargets(1, &gBackbufferRTV, depthStencilView); //UAV for compute shader D3D11_UNORDERED_ACCESS_VIEW_DESC uavd; ZeroMemory(&uavd, sizeof(uavd)); uavd.Format = DXGI_FORMAT_R8G8B8A8_UNORM; uavd.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE2D; uavd.Texture2D.MipSlice = 1; gDevice->CreateUnorderedAccessView(pBackBuffer, &uavd, &gUAV); pBackBuffer->Release();  
      After I render the scene, I dispatch like this:
      gDeviceContext->OMSetRenderTargets(0, NULL, NULL); m_vShaders["cs1"]->Bind(); gDeviceContext->CSSetUnorderedAccessViews(0, 1, &gUAV, 0); gDeviceContext->Dispatch(32, 24, 0); //hard coded ID3D11UnorderedAccessView* nullview = { nullptr }; gDeviceContext->CSSetUnorderedAccessViews(0, 1, &nullview, 0); gDeviceContext->OMSetRenderTargets(1, &gBackbufferRTV, depthStencilView); gSwapChain->Present(0, 0); Worth noting is the scene is rendered as usual, but I dont get any results from the CS (simple gaussian blur)
      I'm sure it's something fairly basic I'm doing wrong, perhaps my understanding of render targets / views / what have you is just completely wrong and my approach just makes no sense.

      If someone with more experience could point me in the right direction I would really appreciate it!

      On a side note, I'd really like to learn more about this kind of stuff. I can really see the potential of the CS aswell as rendering to textures and using them for whatever in the engine so I would love it if you know some good resources I can read about this!

      Thank you <3
       
      P.S I excluded the .hlsl since I cant imagine that being the issue, but if you think you need it to help me just ask

      P:P:S. As you can see this is my first post however I do have another account, but I can't log in with it because gamedev.net just keeps asking me to accept terms and then logs me out when I do over and over
    • By mister345
      Does buffer number matter in ID3D11DeviceContext::PSSetConstantBuffers()? I added 5 or six constant buffers to my framework, and later realized I had set the buffer number parameter to either 0 or 1 in all of them - but they still all worked! Curious why that is, and should they be set up to correspond to the number of constant buffers?
      Similarly, inside the buffer structs used to pass info into the hlsl shader, I added padding inside the c++ struct to make a struct containing a float3 be 16 bytes, but in the declaration of the same struct inside the hlsl shader file, it was missing the padding value - and it still worked! Do they need to be consistent or not? Thanks.
          struct CameraBufferType
          {
              XMFLOAT3 cameraPosition;
              float padding;
          };
    • By noodleBowl
      I was wondering if anyone could explain the depth buffer and the depth stencil state comparison function to me as I'm a little confused
      So I have set up a depth stencil state where the DepthFunc is set to D3D11_COMPARISON_LESS, but what am I actually comparing here? What is actually written to the buffer, the pixel that should show up in the front?
      I have these 2 quad faces, a Red Face and a Blue Face. The Blue Face is further away from the Viewer with a Z index value of -100.0f. Where the Red Face is close to the Viewer with a Z index value of 0.0f.
      When DepthFunc is set to D3D11_COMPARISON_LESS the Red Face shows up in front of the Blue Face like it should based on the Z index values. BUT if I change the DepthFunc to D3D11_COMPARISON_LESS_EQUAL the Blue Face shows in front of the Red Face. Which does not make sense to me, I would think that when the function is set to D3D11_COMPARISON_LESS_EQUAL the Red Face would still show up in front of the Blue Face as the Z index for the Red Face is still closer to the viewer
      Am I thinking of this comparison function all wrong?
      Vertex data just in case
      //Vertex date that make up the 2 faces Vertex verts[] = { //Red face Vertex(Vector4(0.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(0.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(0.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), //Blue face Vertex(Vector4(0.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(0.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(0.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), };  
    • By mellinoe
      Hi all,
      First time poster here, although I've been reading posts here for quite a while. This place has been invaluable for learning graphics programming -- thanks for a great resource!
      Right now, I'm working on a graphics abstraction layer for .NET which supports D3D11, Vulkan, and OpenGL at the moment. I have implemented most of my planned features already, and things are working well. Some remaining features that I am planning are Compute Shaders, and some flavor of read-write shader resources. At the moment, my shaders can just get simple read-only access to a uniform (or constant) buffer, a texture, or a sampler. Unfortunately, I'm having a tough time grasping the distinctions between all of the different kinds of read-write resources that are available. In D3D alone, there seem to be 5 or 6 different kinds of resources with similar but different characteristics. On top of that, I get the impression that some of them are more or less "obsoleted" by the newer kinds, and don't have much of a place in modern code. There seem to be a few pivots:
      The data source/destination (buffer or texture) Read-write or read-only Structured or unstructured (?) Ordered vs unordered (?) These are just my observations based on a lot of MSDN and OpenGL doc reading. For my library, I'm not interested in exposing every possibility to the user -- just trying to find a good "middle-ground" that can be represented cleanly across API's which is good enough for common scenarios.
      Can anyone give a sort of "overview" of the different options, and perhaps compare/contrast the concepts between Direct3D, OpenGL, and Vulkan? I'd also be very interested in hearing how other folks have abstracted these concepts in their libraries.
  • Popular Now