• Content count

  • Joined

  • Last visited

Community Reputation

252 Neutral

About ShaneYCG

  • Rank

Personal Information

  • Interests
  1. Fixed frametime

    Keep in mind that the graphics driver/api/OS will buffer frames as well. How your manual throttling interacts with this can be a huge source of issues. If you're attempting to minimize latency, the OS will do everything in its power to prevent that and try to buffer frames! There is a really great tool for checking this, I think GpuView (someone please correct me if I'm wrong!).
  2. Sounds reasonable. I have a few notes, but they are pretty minor suggestions, and could mostly be left as 'later, if needed' changes.   - What does 'current shader index' mean? If you intend to thread things later, be careful about global state. If this is just internal to the renderer -- the current shader while recording its internal D3D11 cmd list -- that's fine.   - If you don't need rendering backends to be swappable at runtime, you can bypass virtuals. Your different shader backends can just each implement a single non-virtual SetShaderPack, and only the one that a given implementation needs gets compiled in, either by using ifdef or putting them in separate files and only compiling the one needed.   - Whether shader stages need to be tightly coupled is dependent on the backend and which stages are in use, it might be a good idea to do some kind of de-duplication (and avoid unnecessary platform BindShader calls) for shader packs that share some stages, for example you might use the same vertex shader with many different pixel shaders.
  3. Yea, it can be quite a lot of computation! Fortunately it's also a problem that fits data oriented design and parallelism very well! Done right, your skinning data structures can be packed very tightly to reduce cache misses to a negligible amount. Depending on your physics needs, you can also probably calculate many of those characters in parallel.   For example, in our skinning code I cache the prev/next keys in a tightly packed array so that in the common case it's a very fast linear traversal over the bone list (which is sorted by dependency, so the parent is already calculated by the time a child needs it). In the uncommon case that we need to advance to a new keyframe it'll take a (slightly) slower path to fetch the new key pair and put that into the tighter cache.
  4. I'm pretty sure you need to do the quat/interpolation stuff before taking transforms through the hierarchy.
  5.   Branching can be OK as long as it is mostly coherent within the warp/half warp, and for this they would be mostly coherent except for the edge of the light's effectiveness where both if/else may need to be evaluated. For deferred framebuffer b/w can be a significant bottleneck, so the branch may be worthwhile.
  6. If you are calculating light intensity as: intensity = 1 - ( dist(pos, light.pos) / light.range )   You'll start getting negative results once dist exceeds 'range', depending on how the rest of your math works out, this can cause artifacts. You should clamp/saturate the result: intensity = clamp( 1 - ( dist(pos, light.pos) / light.range ), 0, 1 );   Although, the discard solution may be more efficient, as you avoid hitting the framebuffer entirely instead of blending in a zero-strength light.   Depending on the precision of your render target, it may be best to do the discard last, and choose a minimum light strength that you know the render target will floor to zero, eg: if( intensity < 1/255 ) { discard; }   You could also take this one step further and take into account the light color as well, since its RGB values are likely less than 1. vec3 outColor = lightColor.rgb*intensity; if( max( outColor.r, max( outColor.g, outColor.b ) ) < 1/255 ) { discard; }
  7. What gbuffer format are you using? Have you tried verifying the gbuffer contents to ensure no precision/srgb/etc shenanigans are interfering?
  8. After tinkering with various render api + compute api combinations (dx+dx, dx+cuda, gl+cuda, gl+cl, gl+gl), my recommendation would be to ditch GL+CL interop. It's messy, relies on lots of extensions, and can be slow to context switch between. Since you're using GL4.3, you may want to look into GL compute instead. Unfortunately the docs/tutorials are pretty lacking, but if you have familiarity with compute and compute APIs you should be able to figure it out, and once set up it's pretty straightforward.