hlsl #of operations in Bilinear interpolate

Started by
3 comments, last by MJP 13 years, 9 months ago
does anyone know offhand what the number of operations is in the bilinear interpolate? or it is completely obscured by hardware?
i cant seem to find it anywhere.
Advertisement
It's done by dedicated hardware, that's one of the reasons it's so much faster on GPUs.
Be more precise with your question , are you talking about fetching a texture in a shader with bilinear filtering on or you want to make it for yourself ?

If this is the case i will try to give you a precise and correct answer.

First of all FX Composer or Render Monkey will tell you the number of instruction count of your shader.

For bilinear interpolation the hardware will take 4 samples of your texture
and make the blend operation on theses acording to this

Color0----------------------------------Color1
|
|
|
|
|
|
|
Color2----------------------------------Color3


(dx*Color1 + (1-dx)*Color0 + dx*Color3 + (1-dx)*Color2 + dy*Color2 + (1-dy)*Color0 + dy*Color3 + (1-dy) * Color1 ) / 4

dx = offset in range [0..1]
dy

By supposing the hardware is not optimized (wich is not true hopefully) this is for ALU instruction count : 8 multiplications + 8 additions + 1 division + 4 subtract this gives eq 21 ALU Instructions (in case you do it in software (for yourself))

The hardware and hlsl compiler are optimized so the final instruction count should be 8 MULADD and 2 SUBTRACT and 1 MUL (for the last divide).

this gives eq 11 ALU Instructions (in case you do it in software (for yourself))

Ohterwise the HLSL tex2Dxxx do this code wired so it's only one cycle of GPU if the vendor wants to be certified Dx8/9/10 GPU.

You have at least one cache miss memory cost for fetching pixels for the first time to the registers. (further Pixel shader runs will have color prefetched in cache memory) , however as we are discussing about transfer between VRAM and GPU registers this should be really really fast.

You can consider that on today's GPUS bilinear filtering is free (1 cycle) for the instrinsic version.


Quote:Original post by nini
You can consider that on today's GPUS bilinear filtering is free (1 cycle) for the instrinsic version.


This is very true.
I spoke to a mobile GPU developer and he said their GPU's do filtering with dedicated hardware circuits, and that was the main reason it was fast (and a more knowledgeable cache fetch). I imagine the same is true on desktop hardware as if they can fit it on a mobile chip there must be enough space on a desktop chip. He even mentioned that if you could wrangle your compute shader to use a filtering op it could get a free speed boost.
Pretty much all GPU's do the actual filtering in the texture unit. The number of cycles required by the unit may depend on the format of the texture though. NVShaderPerf/GPU ShaderAnalyzer will tell you for sure.

This topic is closed to new replies.

Advertisement