Mesh pre-processing

Started by
12 comments, last by Infinisearch 8 years, 7 months ago

With D3D12 out it seems feasible for me to try to do some screen space tiling in my renderer. So I find myself with the problem of figuring out how to divide my mesh into smaller clusters. So I was wondering if anyone had some resources on mesh feature detection, and other mesh pre-processing functionality? While I have a specific goal, its about time I looked at mesh pre-processing in general so all resources are welcome.

Thanks in advance.

-potential energy is easily made kinetic-

Advertisement

OpenMesh is a great library for mesh processing in general. For example, I used it for a custom decimation operation but it can do lots more.

But I don't get the rest of your question: What has this to do with D3D12 and how do you intend to split meshes for rendering "screen space tiles". From OpenMesh you should not expect high enough performance that allows you to change all your models every frame if that is what you're after. (I doubt that there is any general mesh pre-processing framework that can do such things).


With D3D12 out it seems feasible for me to try to do some screen space tiling in my renderer. So I find myself with the problem of figuring out how to divide my mesh into smaller clusters. So I was wondering if anyone had some resources on mesh feature detection, and other mesh pre-processing functionality?
What? No. In the renderer? Absolutely not.

Ok, I'm mildly interested. What are you really thinking about?

And again, what's the point of D3D12 in this context?

Previously "Krohm"


OpenMesh is a great library for mesh processing in general. For example, I used it for a custom decimation operation but it can do lots more.

Thanks I will take a look.


But I don't get the rest of your question: What has this to do with D3D12 and how do you intend to split meshes for rendering "screen space tiles". From OpenMesh you should not expect high enough performance that allows you to change all your models every frame if that is what you're after. (I doubt that there is any general mesh pre-processing framework that can do such things).


What? No. In the renderer? Absolutely not.
Ok, I'm mildly interested. What are you really thinking about?
And again, what's the point of D3D12 in this context?

Basically what I'm going to do is render my scene sort of like a PowerVR GPU except manually and using triangle clusters instead of binning individual triangles. So the basic gist is this:

0. Pre-process meshes into clusters of triangles. Create OBB per cluster.

1. Divide screen into SS tiles that can fit in the GPU on chip caches. A list of clusters to render per tile.

2. When rendering after frustum culling, Use OBB's to find out which clusters affect which tiles, add to tile list.

3. Render clusters associated per tile with scissor rect to enforce tile.

Since all frame buffer access will hit the cache, rendering should be faster at the cost of render triangles multiple times since a cluster has to be rerendered per tile. DX12 is relevent since there will be alot more draw calls, because of rendering clusters and the overhead of one cluster for multiple tiles, in addition the tiling breaks batching which if I understand things correctly is rather important for DX11. I know DX11 supports drawindirect, but that will come later after I get a feel for the technique after profiling.

-potential energy is easily made kinetic-

Considering a good renderer (that's running on a relevant high end platform) isn't much geometry bound these days I'm not sure I see the point. But my initial guess is that it's doable, sounds a bit like a BVH.

I guess the original poster is describing standard tiled rendering, usually implemented as an optimization in deferred renderers or Forward+.

He mistakenly believes that meshes need to be subdivided for this approach to work, or mistakenly believes that doing it this way by manually breaking up a mesh would be faster.

It is neither necessary nor faster to break apart your mesh, not to mention unfeasible for any but the most basic cases where the meshes never move relative to the screen.

Research deferred rendering tiling optimizations or Forward+. This is unrelated to specifically Direct3D 12.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I think Infini talking about Tiled Rendering in the mobile sense, where screenspace tiles are used for geometry processing rather than light culling to fit into a cache. Though now that I look at it again it does seem to be something about a lighting because a screen buffer is mentioned.

Since all frame buffer access will hit the cache, rendering should be faster at the cost of

Assuming that frame buffer latency is actually a bottleneck in the first place?

If you're doing any sort of modern/fancy shading, then your shading time per pixel will likely be higher than your frame-buffer write time, so pipelining will make the buffer writes 'free'...

Deferred/Tiled (PowerVR style) triangle-binning works well on PowerVR, because it means that instead of performing frame-buffer writes to RAM, it can perform them to a tiny-but-super-fast local storage area (ESRAM/etc), and then later bulk-flush that local storage to RAM.
Implenting the algorithm without having the hardware to suit may not be the best idea...

Xbox360 has local/fast EDRAM, and XbOne and Intel GPUs have local/fast ESRAM, so if you're targeting them you may be able to find some benefit. These local storage areas are likely measured in the 10's of MB's though, so you could get away with using very large tiles.


One other benefit of PowerVR style tiling is that they perform polygon sorting to eliminate overdraw, and allow OIT. By sorting your mesh-chunks (front to back for opaque and back to front for translucent) you'd gain the same benefits but with slightly corser accuracy.
You'd want to perform this sorting on the GPU though, so I'd make use of indirect draws, rather than relying on thousands of individual CPU-driven draws.


I guess the original poster is describing standard tiled rendering, usually implemented as an optimization in deferred renderers or Forward+.
He mistakenly believes that meshes need to be subdivided for this approach to work, or mistakenly believes that doing it this way by manually breaking up a mesh would be faster.

Nope not talking directly about tiled deferred or forward+.


It is neither necessary nor faster to break apart your mesh, not to mention unfeasible for any but the most basic cases where the meshes never move relative to the screen.

Ubisoft and RedLynx seem to be breaking apart there meshes into clusters for culling purposes with success, see GPU driven rendering pipelines here: http://advances.realtimerendering.com/s2015/index.html

-potential energy is easily made kinetic-


Assuming that frame buffer latency is actually a bottleneck in the first place?

If you're doing any sort of modern/fancy shading, then your shading time per pixel will likely be higher than your frame-buffer write time, so pipelining will make the buffer writes 'free'...

Sort of, first off I just wanted to play around and see if there were gains to be had. Second I was going to pair it with a two pass renderer, where the first pass is a visibility pass with no ALU work at all... two variations, zonly and z + id.


Deferred/Tiled (PowerVR style) triangle-binning works well on PowerVR, because it means that instead of performing frame-buffer writes to RAM, it can perform them to a tiny-but-super-fast local storage area (ESRAM/etc), and then later bulk-flush that local storage to RAM.
Implenting the algorithm without having the hardware to suit may not be the best idea...

I brought up GPU caches in a thread on another forum in relation to this type of rendering and was informed on GCN there is something called the CB and DB caches. I believe these are dedicated caches for the color and depth buffers. In that thread I was also referred to a different thread where somebody tried this and IIRC came up with a max tile size of 128x128 for GCN before there was a performance drop indicating that you were going off-chip.

edit - for a 7970 (128k ROP cache)


One other benefit of PowerVR style tiling is that they perform polygon sorting to eliminate overdraw, and allow OIT. By sorting your mesh-chunks (front to back for opaque and back to front for translucent) you'd gain the same benefits but with slightly corser accuracy.
You'd want to perform this sorting on the GPU though, so I'd make use of indirect draws, rather than relying on thousands of individual CPU-driven draws.

Yes I was going to sort front to back for opaques, but I was also going to do back to front to get a feel for performance delta's. You're right about using indirect draws and tiling and sorting on the GPU, but I wanted to keep it simple at first... I have no experience with indirect draws, but like I said I will in the future.

-potential energy is easily made kinetic-

This topic is closed to new replies.

Advertisement