kalle_h

Members
  • Content count

    403
  • Joined

  • Last visited

Community Reputation

2464 Excellent

About kalle_h

  • Rank
    Member
  1. I am not fully understand "looping". In my implementation I have several lights to render. Let it be 10. I create abstraction on C++ side - MGL::InstanceBuffer: the same as cbuffer, but it can be expanded (if there is no enough size, I delete old buffer and create new one with required size within the abstraction). Then I fill it with light data. And make only one draw call when I light the scene. Inside pixel shader, there is a loop, that iterates over all lights from cbuffer. (the code is a little bit old, but it shows the idea) for (uint i = 0; i < 102400; ++i) { if (lights[i].type == LIGHT_NO) break; DiffSpec curr = (DiffSpec)0; switch (lights[i].type) { //Sorted by occurrence case LIGHT_POINT: curr = CalcPoint(lights[i], mat, posVS, toEyeVS); break; case LIGHT_SPOT: curr = CalcSpot(lights[i], mat, posVS, toEyeVS); break; case LIGHT_DIRECTIONAL: curr = CalcDirectional(lights[i], toEyeVS, mat); break; case LIGHT_AMBIENT: curr = CalcAmbient(lights[i].color, kSsao); break; case LIGHT_SHADOW_PRIM: curr = CalcDirectional(lights[i], toEyeVS, mat); curr.diffuse *= kShadowVisibility; break; } totalLight.diffuse += curr.diffuse; totalLight.specular += curr.specular; } So, the more lights you have - the longer the shader will run, but there is only 1 draw() call :) There is no loop in C++ code, only in the shader. Is it your concern?   Why not sort by type and then one branchles loop per type. Loops just need to know light type start and end index.
  2. instanced drawing and frustum cull

    Using frustum that is expanded by max radius of foliage piece to cull single pieces of foliage is quite cheap.(point vs frustum.) With some hierachy to not test every foliage but only pieces that belongs to chunks that intersect with frustum it might be fast enough. GPU culling would be best choise.   For CPU particles I have noticed that frustum culling can be beneficial even per particle basis.
  3. instanced drawing and frustum cull

    You can split chunks for smaller batches than 90k. But start with brute force frustum check per plant and see how slow it will be. Frustum culling will not only save GPU performance but everything related on rendering those plants. Animation, matrix concatenations, bandwith.   You can also use some sort of quad tree culling for fast and accurate frustum culling.
  4. If I understand it right UE4 should be free for your non game usage. https://www.unrealengine.com/faq
  5. Downsizing normal maps

    Both will save graphics memory, compressing should considered before scaling down.   You are wrong. JPG has to be uncompressed before uploading video memory. Results is lossy compression with no actual compression where it matters.(runtime).
  6. Downsizing normal maps

      Never do that. JPG is lossy format. It's also not designed for normal maps but photos.
  7.   But he doesn't go into detail and only states that it is "easy" to implement in a deferred context. Can anyone tell me how this works exactly ?   Render each cubemap either using fullscreen pass or bounding volume. Output to RGBA16f target with additive blending. To RGB channels output weighted reflection and for A channel output weight. When using this reflection buffer you simple normalize reflections by dividing with A. Be carefully not to divide by zero.
  8.   Intel recently released an article and sample code that builds upon Morgan McGuire's work for performance-scalable SSAO. I would recommend checking that out as well if you're interested in improving your performance.   My opinion is that SSAO is better with temporal smoothing instead of spatial smoothing. All ssao algorithms that rely depth aware blurs can be quite noisy with foliage. Those also smooth out all normal map details which make indirect lighting bit boring. Deinterleaved rendering ins't needed if you are using depth mips. Depth aware blur is usually quite expensive too compared to ssao with few samples.
  9. That is definetly about cache misses. Texture caches work spatialy so if samples are scattered around the screen you get bad performance. This is the case when there is something near of the camera. Camera space based sampling means that sample at 1meter away can be 1000 texels away when current sample is near camera. There is really clever way to solve this by using depth mipmaps. Then performance is constant no matter how close or far objects are.  http://graphics.cs.williams.edu/papers/SAOHPG12/   I have implemented this technique and it works really well.
  10. Diffuse and ambient should be float3. There is no sense of calculating light value for alpha channel. It's just wasteful. 
  11. In my case I have decent amount vertices with not trivial vs, I guess having those vertices go though vs twice will probably slower than the method I mentioned? But you are right I should benchmark it   Thanks    How much overdraw you expect?
  12. You should test to render twice with less and greater depth testing.
  13. http://esotericsoftware.com/
  14. The GPU's concept of a vertex is "a tuple of attributes", such as position/normal/texture-coordinate -- the mathematical definition doesn't really apply :( A GPU vertex doesn't even need to include position! When drawing curved shapes, "GPU vertices" are actually "control points" and not vertices at all.   There's also no native way to supply per-primitive data to the GPU -- such as supplying positions per-vertex and normals per-face. Implementing per-face attributes is actually harder (and requires more computation time) than supplying attributes per vertex, because the native "input assembler" only has the concept of per-vertex and per-instance attributes.   You could treat vertex as triangle and expand it at geometry shader. It just bit cumbersome and performance would be awful.
  15. Computing Matrices on the GPU

    Just calculate it at vertex shader for start and then profile to see what is your bottleneck. If it seems that vertex shader calculations are problem then you start to think how to optimize it.