Jump to content
  • Advertisement
Sign in to follow this  
ucfchuck

hlsl #of operations in Bilinear interpolate

This topic is 3008 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

does anyone know offhand what the number of operations is in the bilinear interpolate? or it is completely obscured by hardware?
i cant seem to find it anywhere.

Share this post


Link to post
Share on other sites
Advertisement
It's done by dedicated hardware, that's one of the reasons it's so much faster on GPUs.

Share this post


Link to post
Share on other sites
Be more precise with your question , are you talking about fetching a texture in a shader with bilinear filtering on or you want to make it for yourself ?

If this is the case i will try to give you a precise and correct answer.

First of all FX Composer or Render Monkey will tell you the number of instruction count of your shader.

For bilinear interpolation the hardware will take 4 samples of your texture
and make the blend operation on theses acording to this

Color0----------------------------------Color1
|
|
|
|
|
|
|
Color2----------------------------------Color3


(dx*Color1 + (1-dx)*Color0 + dx*Color3 + (1-dx)*Color2 + dy*Color2 + (1-dy)*Color0 + dy*Color3 + (1-dy) * Color1 ) / 4

dx = offset in range [0..1]
dy

By supposing the hardware is not optimized (wich is not true hopefully) this is for ALU instruction count : 8 multiplications + 8 additions + 1 division + 4 subtract this gives eq 21 ALU Instructions (in case you do it in software (for yourself))

The hardware and hlsl compiler are optimized so the final instruction count should be 8 MULADD and 2 SUBTRACT and 1 MUL (for the last divide).

this gives eq 11 ALU Instructions (in case you do it in software (for yourself))

Ohterwise the HLSL tex2Dxxx do this code wired so it's only one cycle of GPU if the vendor wants to be certified Dx8/9/10 GPU.

You have at least one cache miss memory cost for fetching pixels for the first time to the registers. (further Pixel shader runs will have color prefetched in cache memory) , however as we are discussing about transfer between VRAM and GPU registers this should be really really fast.

You can consider that on today's GPUS bilinear filtering is free (1 cycle) for the instrinsic version.


Share this post


Link to post
Share on other sites
Quote:
Original post by nini
You can consider that on today's GPUS bilinear filtering is free (1 cycle) for the instrinsic version.


This is very true.
I spoke to a mobile GPU developer and he said their GPU's do filtering with dedicated hardware circuits, and that was the main reason it was fast (and a more knowledgeable cache fetch). I imagine the same is true on desktop hardware as if they can fit it on a mobile chip there must be enough space on a desktop chip. He even mentioned that if you could wrangle your compute shader to use a filtering op it could get a free speed boost.

Share this post


Link to post
Share on other sites
Pretty much all GPU's do the actual filtering in the texture unit. The number of cycles required by the unit may depend on the format of the texture though. NVShaderPerf/GPU ShaderAnalyzer will tell you for sure.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!