Matrix Calculation Efficiency

Started by
9 comments, last by Hodgman 7 years, 9 months ago

Right now I can measure time in NSight's "Events" window with nonosec-precision and can’t see performance gain between the shaders.
Is there a way to measure the difference in a finer way?

Well there's two explanations -
1) NSight can't measure the difference.
2) There is no performance difference...

It could be that when the driver tranlsates from D3D bytecode to native asm, it's unrolling the loops, meaning you get the same shader in both cases.
It could be that branching in a GPU these days is free as long as (a) the branch isn't divergent and (b) is surrounded by enough other operations that it can be scheduled into free space.

e.g. on that latter point, this branch won't be divergant because the path taken is a compile time constant. I'm not up to date with NV's HW specifics (and they're secretive...) but on AMD HW, branch set-up is done using scalar (aka per-wavefront) instructions, which are dual-issued with vector (aka per-thread/pixel/vertex/etc) instructions, which means they're often free as the scalar instruction stream is usually not saturated.

This topic is closed to new replies.

Advertisement