Jump to content
  • Advertisement
Sign in to follow this  
wqking

x86 PC, is it worth optimizing for cpu cache?

This topic is 2564 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

[ Please keep the topic to cpu cache only, no need to talk such as "optimize algorithm" first, that should be another topic. ]

Hi cpu optimizing ninjas,

What I need to do in one of my hobby C++ project, is to process small data (hundreds or thousands of bytes in a lot of (may be up to 10K or 100K) iterations eath time, such like changing some bits in some bytes, etc. And that may happen a lot of times too, such like 100K. 10K*100K is driving me to think about optimizing...

Question 1, if you have any similar experience on optimizing cpu cache missing on x86 PC, can you tell the result? Is it significant performance improvement?
I highly guess so, but some real life experience may give me more confidence.

Question 2, if you can recommend any very good guide on optimizing cpu cache in C++, that would help me to kick off fast. I'm googling and gdnet-ing also. But my current cpu knowledge is still at the 80386 era.
Currently I found this post is quite good to read,
http://www.gamedev.net/topic/542247-cache-optimisations-for-beginners/

Share this post


Link to post
Share on other sites
Advertisement
1) IMHO, cache-usage optimisation is THE main low-level optimisation technique these days. CPU cycles are cheap, but memory is horribly slow in modern computers. I don't care about how many CPU cycles an algorithm takes, I only care about which parts of RAM it's accessing.

As an example:
At work I've been rewriting our renderer lately. The old renderer was not optimised for cache at all, but with the new one, I've been thinking about the cache constantly. Every time I write a structure or allocate some memory, I consider how, when and why that data will be used by the CPU.

We knew that the old renderer was slow, but there were no real 'bottlenecks' in it -- when you profiled it, there wasn't an obvious part that needed to be optimised. It was just slow everywhere due to constant cache misses.

The old renderer took over 8ms of time to process a sample level, whereas the new re-written renderer takes 0.6ms to process the same level. That's more than a 10x speed-up, mostly due to caring about memory!

2) It's hard to take existing code and optimise it for good cache usage. Usually you'll have to rewrite your data structures.
http://research.scee...ing_GCAP_09.pdf
http://gamesfromwith...oriented-design
http://bitsquid.blog...a-oriented.html
http://www.slideshar...oriented-design
http://www.slideshar...ata-orientation

Share this post


Link to post
Share on other sites

The old renderer took over 8ms of time to process a sample level, whereas the new re-written renderer takes 0.6ms to process the same level. That's more than a 10x speed-up, mostly due to caring about memory!


8->0.6 is already a good reason for me to keep cpu cache optimization in my mind.
I will try to tweak my OOP-only data structure and code a little more DOP (data oriented) like.

Thanks for the timing data!



2) It's hard to take existing code and optimise it for good cache usage. Usually you'll have to rewrite your data structures.
http://research.scee...ing_GCAP_09.pdf
http://gamesfromwith...oriented-design
http://bitsquid.blog...a-oriented.html
http://www.slideshar...oriented-design
http://www.slideshar...ata-orientation


I also found this presentation is quite good,
http://www.research.scea.com/research/pdfs/GDC2003_Memory_Optimization_18Mar03.pdf

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!