Back to General and Gameplay Programming

x86 PC, is it worth optimizing for cpu cache?

General and Gameplay Programming Programming

Started by wqking May 20, 2011 03:06 AM

1 comment, last by wqking 12 years, 11 months ago

wqking

778

Author

May 20, 2011 03:06 AM

[ Please keep the topic to cpu cache only, no need to talk such as "optimize algorithm" first, that should be another topic. ]

Hi cpu optimizing ninjas,

What I need to do in one of my hobby C++ project, is to process small data (hundreds or thousands of bytes in a lot of (may be up to 10K or 100K) iterations eath time, such like changing some bits in some bytes, etc. And that may happen a lot of times too, such like 100K. 10K*100K is driving me to think about optimizing...

Question 1, if you have any similar experience on optimizing cpu cache missing on x86 PC, can you tell the result? Is it significant performance improvement?
I highly guess so, but some real life experience may give me more confidence.

Question 2, if you can recommend any very good guide on optimizing cpu cache in C++, that would help me to kick off fast. I'm googling and gdnet-ing also. But my current cpu knowledge is still at the 80386 era.
Currently I found this post is quite good to read,
http://www.gamedev.net/topic/542247-cache-optimisations-for-beginners/

https://www.kbasm.com -- My personal website

https://github.com/wqking/eventpp eventpp -- C++ library for event dispatcher and callback list

https://github.com/cpgf/cpgf cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.

Hodgman

52,717

May 20, 2011 03:38 AM

1) IMHO, cache-usage optimisation is THE main low-level optimisation technique these days. CPU cycles are cheap, but memory is horribly slow in modern computers. I don't care about how many CPU cycles an algorithm takes, I only care about which parts of RAM it's accessing.

As an example:
At work I've been rewriting our renderer lately. The old renderer was not optimised for cache at all, but with the new one, I've been thinking about the cache constantly. Every time I write a structure or allocate some memory, I consider how, when and why that data will be used by the CPU.

We knew that the old renderer was slow, but there were no real 'bottlenecks' in it -- when you profiled it, there wasn't an obvious part that needed to be optimised. It was just slow everywhere due to constant cache misses.

The old renderer took over 8ms of time to process a sample level, whereas the new re-written renderer takes 0.6ms to process the same level. That's more than a 10x speed-up, mostly due to caring about memory!

2) It's hard to take existing code and optimise it for good cache usage. Usually you'll have to rewrite your data structures.
http://research.scee...ing_GCAP_09.pdf
http://gamesfromwith...oriented-design
http://bitsquid.blog...a-oriented.html
http://www.slideshar...oriented-design
http://www.slideshar...ata-orientation

. 22 Racing Series .

wqking

778

Author

May 20, 2011 04:46 AM

The old renderer took over 8ms of time to process a sample level, whereas the new re-written renderer takes 0.6ms to process the same level. That's more than a 10x speed-up, mostly due to caring about memory!

8->0.6 is already a good reason for me to keep cpu cache optimization in my mind.
I will try to tweak my OOP-only data structure and code a little more DOP (data oriented) like.

Thanks for the timing data!

2) It's hard to take existing code and optimise it for good cache usage. Usually you'll have to rewrite your data structures.
http://research.scee...ing_GCAP_09.pdf
http://gamesfromwith...oriented-design
http://bitsquid.blog...a-oriented.html
http://www.slideshar...oriented-design
http://www.slideshar...ata-orientation

I also found this presentation is quite good,
http://www.research.scea.com/research/pdfs/GDC2003_Memory_Optimization_18Mar03.pdf

https://www.kbasm.com -- My personal website

https://github.com/wqking/eventpp eventpp -- C++ library for event dispatcher and callback list

https://github.com/cpgf/cpgf cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.

x86 PC, is it worth optimizing for cpu cache?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

x86 PC, is it worth optimizing for cpu cache?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines