Jump to content
  • Advertisement
Sign in to follow this  
yk_cadcg

[gpgpu, dx10] how about the idea of scanning a large cbuffer?

This topic is 4076 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, constant buffer has 16*4k*float4's. What if I use it heavily, say, at each drawing, i pack 64k float4's into cbuffer, and ps scan the cbuffer? That'll result in a scan of a 640k float4 data in 10 passes. Any better ways (should be on dx)? Thanks!

Share this post


Link to post
Share on other sites
Advertisement
Without knowing intimate details about the G80/R600 architecture (you can get a fair bit of info from CUDA and the various articles floating around online) I'd say you'd get pretty bad performance from this. More a hunch than anything, but you're using the CB's in a way they're not really designed (e.g. they're more supporting data than primary data) and it's likely to reduce the GPU's performance. I'm thinking of situations where it limits the number of fragments in flight, increases register pressure etc...

I could be completely wrong though - the performance characteristics of D3D10 are still a big unknown to most developers. If you do implement it then it'd be interesting if you share your results [smile]


Cheers,
Jack

Share this post


Link to post
Share on other sites
Thanks very much! I've already got error (see my last post), perhaps because of the "register pressure" etc. you mentioned.
I sure will report to us later.

Quote:
Original post by jollyjeffers
Without knowing intimate details about the G80/R600 architecture (you can get a fair bit of info from CUDA and the various articles floating around online) I'd say you'd get pretty bad performance from this. More a hunch than anything, but you're using the CB's in a way they're not really designed (e.g. they're more supporting data than primary data) and it's likely to reduce the GPU's performance. I'm thinking of situations where it limits the number of fragments in flight, increases register pressure etc...

I could be completely wrong though - the performance characteristics of D3D10 are still a big unknown to most developers. If you do implement it then it'd be interesting if you share your results [smile]


Cheers,
Jack


Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!