Jump to content
  • Advertisement
turanszkij

R&D Tile based particle rendering

Recommended Posts

Hi,

Tile based renderers are quite popular nowadays, like tiled deferred, forward+ and clustered renderers. There is a presentation about GPU based particle systems from AMD. What particularly interest me is the tile based rendering part. The basic idea is, that leave the rasterization pipeline when rendering billboards and do it in a compute shader instead, much like Forward+. You determine tile frustums, cull particles, sort front to back, then render them until the accumulated alpha value is below 1. The performance results at the end of the slides seems promising. Has anyone ever implemented this? Was it a success, is it worth doing? The front to back rendering is the most interesting part in my opinion, because overdraw can be eliminated for alpha blending.

The demo is sadly no longer available..

Edited by turanszkij

Share this post


Link to post
Share on other sites
Advertisement

It is doable, we do it all in our game, but we do it back to front ( no earlier out ) and we also interleaved them with sorted fragment from traditional geometry or unsupported particles types. The bandwidth saving plus well written shader optimization make it a good gain ( plus order independent transparency draw :) )

The challenge is DX11 PC without bindless, you have to deal with texture atlas and drivers having a hard time to optimise such a complex shader ( from the DXBC compared to console where we have dedicated shader compiler ), On Console and dx12/Vulkan, you can also just provide an array of texture descriptors, so easier :) For practical reason and storage for the culling you may want to limit the number of particles to a few thousands, it was fine for us, but other games based on heavy effects would have mourn.

 

 

 

Share this post


Link to post
Share on other sites
3 hours ago, Infinisearch said:

I have the demo if you want it, and it is available on Github here: https://github.com/GPUOpen-LibrariesAndSDKs/GPUParticles11/

edit - I attached the demo.

GPUParticles11_v1.0.zip

Thanks for that, I will check it out. I've just started implementing this myself, anyway. :)

1 hour ago, galop1n said:

For practical reason and storage for the culling you may want to limit the number of particles to a few thousands, it was fine for us, but other games based on heavy effects would have mourn.

Hm, that's a bit disappointing. I know that most games probably don't use more than a few thousand particles anyway, but I thought that this would help with the sheer numbers as well apart from overdraw optimization.

Share this post


Link to post
Share on other sites

I have managed to implement this technique on a console (PS4). I made a tech demo which renders particles for high overdraw and with heavy shaders (per pixel lighting). It can also render particles spread out in the distance with little to no overdraw. I am using an additional coarse culling step for the tile based approach, like in the AMD demo. The coarse culling culls particles for large screen space tiles (240x135 pixels). The fine culling culls particles for 32x32 pixel tiles and renders them in the same shader.

With 100,000 particles filling the screen and heavy overdraw, the tile based technique is a clear win, manages to remain under 30 ms, while the rasterization based technique, it renders them in about 70 ms.

With 100,000 small particles on screen with little overdraw, the rasterization performs clearly better, going below 10 ms easily. The tile based approach is around 15-20 ms this time.

With 1,000,000 particles, and heavy overdraw, the tile based approach can not keep up, because it runs out of LDS to store per tile particle lists which results in flickering. The performance is slow, but the rasterization is much slower, however, it renders without artifacts.

1,000,000 particles, little overdraw, the tile based approach suffers from culling performance, while the rasterization easily does 60 FPS.

It seems to me it can only be used for specific scenarios, with little amount of particles and heavy overdraw. However, I imagine most games do not use millions of particles, so it might be worth implementing.

Share this post


Link to post
Share on other sites
3 minutes ago, turanszkij said:

The coarse culling culls particles for large screen space tiles (240x135 pixels). The fine culling culls particles for 32x32 pixel tiles and renders them in the same shader.

Whats the point of coarse culling in this case... if you're gonna tile why not just go straight for the fine tiles?  Wouldn't there be less memory access's this way?  Also since it seems you're using LDS for tile particle lists why not increase the fine tile size to 64x64 since the L2 should be big enough to keep the whole tile cached?  I'm most likely missing something... I don't really remember the presentation that well.

14 minutes ago, turanszkij said:

With 100,000 particles filling the screen and heavy overdraw, the tile based technique is a clear win, manages to remain under 30 ms, while the rasterization based technique, it renders them in about 70 ms.

With 100,000 small particles on screen with little overdraw, the rasterization performs clearly better, going below 10 ms easily. The tile based approach is around 15-20 ms this time.

With 1,000,000 particles, and heavy overdraw, the tile based approach can not keep up, because it runs out of LDS to store per tile particle lists which results in flickering. The performance is slow, but the rasterization is much slower, however, it renders without artifacts.

1,000,000 particles, little overdraw, the tile based approach suffers from culling performance, while the rasterization easily does 60 FPS.

Aren't smoke effects typically medium to high overdraw?  If so it would most likely be a win.

Share this post


Link to post
Share on other sites
1 minute ago, Infinisearch said:

Whats the point of coarse culling in this case... if you're gonna tile why not just go straight for the fine tiles?  Wouldn't there be less memory access's this way?  Also since it seems you're using LDS for tile particle lists why not increase the fine tile size to 64x64 since the L2 should be big enough to keep the whole tile cached?  I'm most likely missing something... I don't really remember the presentation that well.

Coarse culling does result in more memory accesses, but can lighten the load on the fine culling step a lot, because now you don't do fine culling for a million particles per tile, but just 10,000 for instance, or whatever the number which is in the coarse tile. This generally improves speed a lot, but this way you have an additional indirection of course. The coarse culling also has a better thread distribution: Only dispatch for the number of particles, and each particle adds itself to the relevant tiles, as opposed to the fine culling, where you dispatch for tiles, and the tiles iterate through each particle and add them.

64x64 tile size would require even more LDS storage, not less. And I can't even dispatch a threadgroup that big. If I cut back to 16x16 tiles though, then the LDS can be better utilized, because less particles will be visible in the tile, but the parallel nature of the culling will be worse. With a 32x32 tile, each thread culls a particle until all are culled, meaning 1024 particles are culled in parallel. With a 16x16 tile, 256 particles are culled in parallel, which is slower.

Share this post


Link to post
Share on other sites

By the way, I am also doing decal rendering for a Forward+ renderer, and they also benefit from top-to-bottom sorting while blending them bottom-to-top and skipping the bottom ones when the alpha is already one. :)

Edited by turanszkij

Share this post


Link to post
Share on other sites
17 hours ago, turanszkij said:

Coarse culling does result in more memory accesses, but can lighten the load on the fine culling step a lot, because now you don't do fine culling for a million particles per tile, but just 10,000 for instance, or whatever the number which is in the coarse tile. This generally improves speed a lot, but this way you have an additional indirection of course. The coarse culling also has a better thread distribution: Only dispatch for the number of particles, and each particle adds itself to the relevant tiles, as opposed to the fine culling, where you dispatch for tiles, and the tiles iterate through each particle and add them.

I guess I'll have to read through the presentation again, its been a while.

17 hours ago, turanszkij said:

64x64 tile size would require even more LDS storage, not less. And I can't even dispatch a threadgroup that big. If I cut back to 16x16 tiles though, then the LDS can be better utilized, because less particles will be visible in the tile, but the parallel nature of the culling will be worse. With a 32x32 tile, each thread culls a particle until all are culled, meaning 1024 particles are culled in parallel. With a 16x16 tile, 256 particles are culled in parallel, which is slower.

Oh you use LDS to do blending as well?  I thought you were going through the L1 and L2, thats why I suggested a larger tile size.  But upon thinking about it more like you said more particles might be visible with bigger tiles... and you're using LDS for the particle list.  As far as your 256 vs 1024 particles being culled in parallel that doesn't seem right to me.  You're using LDS so wouldn't the compute shaders' execution be limited to one CU (AMD hardware)?  If so it would be limited to 64 threads per clock, and 256 threads in lock step (since each 16 wide SIMD executes with a cadence of 4 clocks) @MJP @Hodgman  Could you clear this up for me, if a compute shader uses LDS is its execution limited to one CU?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
  • Advertisement
  • Popular Tags

  • Similar Content

    • By NanaMarfo
      Hello Everyone!
      I am looking for a small team to do a rendering project with me. The roles I need are:
      -Character Modeller
      -Environment Designer
      -Environment Modeller(Found)
      You can use this in your portfolio and you will be credited at the end.
      If you are interested, please email me at marfo343@gmail.com. Thank you!
    • By D34DPOOL
      Edit Your Profile D34DPOOL 0 Threads 0 Updates 0 Messages Network Mod DB GameFront Sign Out Add jobEdit jobDeleteC# Programmer for a Unity FPS at Anywhere   Programmers located Anywhere.
      Posted by D34DPOOL on May 20th, 2018
      Hello, my name is Mason, and I've been working on a Quake style arena shooter about destroying boxes on and off for about a year now. I have a proof of concept with all of the basic features, but as an artist with little programming skill I've reached the end of my abilities as a programmer haha. I need someone to help fix bugs, optomize code, and to implent new features into the game. As a programmer you will have creative freedom to suggest new features and modes to add into the game if you choose to, I'm usually very open to suggestions :).
      What is required:
      Skill using C#
      Experience with Unity
      Experience using UNET (since it is a multiplayer game), or the effort and ability to learn it
      Compensation:
      Since the game currently has no funding, we can split whatever revenue the game makes in the future. However if you would perfer I can create 2D and/or 3D assets for whatever you need in return for your time and work.
      It's a very open and chill enviornment, where you'll have relative creative freedom. I hope you are interested in joining the team, and have a good day!
       
      To apply email me at mangemason@yahoo.com
    • By Andrew Parkes
      I am a talented 2D/3D artist with 3 years animation working experience and a Degree in Illustration and Animation. I have won a world-wide art competition hosted by SFX magazine and am looking to develop a survival game. I have some knowledge of C sharp and have notes for a survival based game with flexible storyline and PVP. Looking for developers to team up with. I can create models, animations and artwork and I have beginner knowledge of C sharp with Unity. The idea is Inventory menu based gameplay and is inspired by games like DAYZ.
      Here is some early sci-fi concept art to give you an idea of the work level. Hope to work with like minded people and create something special. email me andrewparkesanim@gmail.com.
      Developers who share the same passion please contact me, or if you have a similar project and want me to join your team email me. 
      Many thanks, Andrew.

    • By thecheeselover
      I made this post on Reddit. I need ideas and information on how to create the ground mesh for my specifications.
    • By Canadian Map Makers
      GOVERNOR is a modernized version of the highly popular series of “Caesar” games. Our small team has already developed maps, written specifications, acquired music and performed the historical research needed to create a good base for the programming part of the project.

      Our ultimate goal is to create a world class multi-level strategic city building game, but to start with we would like to create some of the simpler modules to demonstrate proof of concept and graphical elegance.

       

      We would like programmers and graphical artists to come onboard to (initially) create:

      A module where Province wide infrastructure can be built on an interactive 3D map of one of the ancient Roman Provinces.
      A module where city infrastructure can be built on a real 3D interactive landscape.
      For both parts, geographically and historically accurate base maps will be prepared by our team cartographer. Graphics development will be using Blender. The game engine will be Unity.

       

      More information, and examples of the work carried out so far can be found at http://playgovernor.com/ (most of the interesting content is under the Encyclopedia tab).

       

      This project represents a good opportunity for upcoming programmers and 3D modeling artists to develop something for their portfolios in a relatively short time span, working closely with one of Canada’s leading cartographers. There is also the possibility of being involved in this project to the point of a finished game and commercial success! Above all, this is a fun project to work on.

       

      Best regards,

      Steve Chapman (Canadian Map Makers)

       
    • By RobMaddison
      Hi
      I’ve been working on a game engine for years and I’ve recently come back to it after a couple of years break.  Because my engine uses DirectX9.0c I thought maybe it would be a good idea to upgrade it to DX11. I then installed Windows 10 and starting tinkering around with the engine trying to refamiliarise myself with all the code.
      It all seems to work ok in the new OS but there’s something I’ve noticed that has caused a massive slowdown in frame rate. My engine has a relatively sophisticated terrain system which includes the ability to paint roads onto it, ala CryEngine. The roads are spline curves and built up with polygons matching the terrain surface. It used to work perfectly but I’ve noticed that when I’m dynamically adding the roads, which involves moving the spline curve control points around the surface of the terrain, the frame rate comes to a grinding halt.
      There’s some relatively complex processing going on each time the mouse moves - the road either side of the control point(s) being moved, is reconstructed in real time so you can position and bend the road precisely. On my previous OS, which was Win2k Pro, this worked really smoothly and in release mode there was barely any slow down in frame rate, but now it’s unusable. As part of the road reconstruction, I lock the vertex and index buffers and refill them with the new values so my question is, on windows 10 using DX9, is anyone aware of any locking issues? I’m aware that there can be contention when locking buffers dynamically but I’m locking with LOCK_DISCARD and this has never been an issue before.
      Any help would be greatly appreciated.
    • By MikhailGorobets
      I have a problem with SSAO. On left hand black area.
      Code shader:
      Texture2D<uint> texGBufferNormal : register(t0); Texture2D<float> texGBufferDepth : register(t1); Texture2D<float4> texSSAONoise : register(t2); float3 GetUV(float3 position) { float4 vp = mul(float4(position, 1.0), ViewProject); vp.xy = float2(0.5, 0.5) + float2(0.5, -0.5) * vp.xy / vp.w; return float3(vp.xy, vp.z / vp.w); } float3 GetNormal(in Texture2D<uint> texNormal, in int3 coord) { return normalize(2.0 * UnpackNormalSphermap(texNormal.Load(coord)) - 1.0); } float3 GetPosition(in Texture2D<float> texDepth, in int3 coord) { float4 position = 1.0; float2 size; texDepth.GetDimensions(size.x, size.y); position.x = 2.0 * (coord.x / size.x) - 1.0; position.y = -(2.0 * (coord.y / size.y) - 1.0); position.z = texDepth.Load(coord); position = mul(position, ViewProjectInverse); position /= position.w; return position.xyz; } float3 GetPosition(in float2 coord, float depth) { float4 position = 1.0; position.x = 2.0 * coord.x - 1.0; position.y = -(2.0 * coord.y - 1.0); position.z = depth; position = mul(position, ViewProjectInverse); position /= position.w; return position.xyz; } float DepthInvSqrt(float nonLinearDepth) { return 1 / sqrt(1.0 - nonLinearDepth); } float GetDepth(in Texture2D<float> texDepth, float2 uv) { return texGBufferDepth.Sample(samplerPoint, uv); } float GetDepth(in Texture2D<float> texDepth, int3 screenPos) { return texGBufferDepth.Load(screenPos); } float CalculateOcclusion(in float3 position, in float3 direction, in float radius, in float pixelDepth) { float3 uv = GetUV(position + radius * direction); float d1 = DepthInvSqrt(GetDepth(texGBufferDepth, uv.xy)); float d2 = DepthInvSqrt(uv.z); return step(d1 - d2, 0) * min(1.0, radius / abs(d2 - pixelDepth)); } float GetRNDTexFactor(float2 texSize) { float width; float height; texGBufferDepth.GetDimensions(width, height); return float2(width, height) / texSize; } float main(FullScreenPSIn input) : SV_TARGET0 { int3 screenPos = int3(input.Position.xy, 0); float depth = DepthInvSqrt(GetDepth(texGBufferDepth, screenPos)); float3 normal = GetNormal(texGBufferNormal, screenPos); float3 position = GetPosition(texGBufferDepth, screenPos) + normal * SSAO_NORMAL_BIAS; float3 random = normalize(2.0 * texSSAONoise.Sample(samplerNoise, input.Texcoord * GetRNDTexFactor(SSAO_RND_TEX_SIZE)).rgb - 1.0); float SSAO = 0; [unroll] for (int index = 0; index < SSAO_KERNEL_SIZE; index++) { float3 dir = reflect(SamplesKernel[index].xyz, random); SSAO += CalculateOcclusion(position, dir * sign(dot(dir, normal)), SSAO_RADIUS, depth); } return 1.0 - SSAO / SSAO_KERNEL_SIZE; }  



    • By Ike aka Dk
      Hello everyone 
      I am a programmer from Baku.
      I need a 3D Modeller for my shooter project in unity.I have 2 years Unity exp.
      Project will paid when we finish the work 
      If you interested write me on email:
      mr.danilo911@gmail.com
    • By MarcusAseth
      Hi guys, I'm trying to learn this stuff but running into some problems 😕
      I've compiled my .hlsl into a header file which contains the global variable with the precompiled shader data:
      //... // Approximately 83 instruction slots used #endif const BYTE g_vs[] = { 68, 88, 66, 67, 143, 82, 13, 236, 152, 133, 219, 113, 173, 135, 18, 87, 122, 208, 124, 76, 1, 0, 0, 0, 16, 76, 0, 0, 6, 0, //.... And now following the "Compiling at build time to header files" example at this msdn link , I've included the header files in my main.cpp and I'm trying to create the vertex shader like this:
      hr = g_d3dDevice->CreateVertexShader(g_vs, sizeof(g_vs), nullptr, &g_d3dVertexShader); if (FAILED(hr)) { return -1; } and this is failing, entering the if and returing -1.
      Can someone point out what I'm doing wrong? 😕 
    • By Toastmastern
      Hello everyone,
      After a few years of break from coding and my planet render game I'm giving it a go again from a different angle. What I'm struggling with now is that I have created a Frustum that works fine for now atleast, it does what it's supose to do alltho not perfect. But with the frustum came very low FPS, since what I'm doing right now just to see if the Frustum worked is to recreate the vertex buffer every frame that the camera detected movement. This is of course very costly and not the way to do it. Thats why I'm now trying to learn how to create a dynamic vertexbuffer instead and to map and unmap the vertexes, in the end my goal is to update only part of the vertexbuffer that is needed, but one step at a time ^^

      So below is my code which I use to create the Dynamic buffer. The issue is that I want the size of the vertex buffer to be big enough to handle bigger vertex buffers then just mPlanetMesh.vertices.size() due to more vertices being added later when I start to do LOD and stuff, the first render isn't the biggest one I will need.
      vertexBufferDesc.Usage = D3D11_USAGE_DYNAMIC; vertexBufferDesc.ByteWidth = mPlanetMesh.vertices.size(); vertexBufferDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER; vertexBufferDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE; vertexBufferDesc.MiscFlags = 0; vertexBufferDesc.StructureByteStride = 0; vertexData.pSysMem = &mPlanetMesh.vertices[0]; vertexData.SysMemPitch = 0; vertexData.SysMemSlicePitch = 0; result = device->CreateBuffer(&vertexBufferDesc, &vertexData, &mVertexBuffer); if (FAILED(result)) { return false; } What happens is that the 
      result = device->CreateBuffer(&vertexBufferDesc, &vertexData, &mVertexBuffer); Makes it crash due to Access Violation. When I put the vertices.size() in it works without issues, but when I try to set it to like vertices.size() * 2 it crashes.
      I googled my eyes dry tonight but doesn't seem to find people with the same kind of issue, I've read that the vertex buffer can be bigger if needed. What I'm I doing wrong here?
       
      Best Regards and Thanks in advance
      Toastmastern
  • Advertisement
  • Popular Now

  • Forum Statistics

    • Total Topics
      631354
    • Total Posts
      2999498
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!