Matrix Calculation Efficiency

DividedByZero · 2016-07-22T14:22:24

Hi Guys, At present, I send the W, V, & P matrices to the shader where they are multiplied within the shader to position vertices. Would it be more efficient to pre-multiply these on the CPU and then pass the result to the shader? Thanks in advance :)

Graphics and GPU Programming Programming Optimization

Started by DividedByZero July 22, 2016 01:21 AM

9 comments, last by Hodgman 7 years, 9 months ago

Hodgman

52,717

July 22, 2016 02:22 PM

Right now I can measure time in NSight's "Events" window with nonosec-precision and can’t see performance gain between the shaders.
Is there a way to measure the difference in a finer way?

Well there's two explanations -
1) NSight can't measure the difference.
2) There is no performance difference...

It could be that when the driver tranlsates from D3D bytecode to native asm, it's unrolling the loops, meaning you get the same shader in both cases.
It could be that branching in a GPU these days is free as long as (a) the branch isn't divergent and (b) is surrounded by enough other operations that it can be scheduled into free space.

e.g. on that latter point, this branch won't be divergant because the path taken is a compile time constant. I'm not up to date with NV's HW specifics (and they're secretive...) but on AMD HW, branch set-up is done using scalar (aka per-wavefront) instructions, which are dual-issued with vector (aka per-thread/pixel/vertex/etc) instructions, which means they're often free as the scalar instruction stream is usually not saturated.

. 22 Racing Series .

Matrix Calculation Efficiency

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Matrix Calculation Efficiency

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines