Jump to content
  • Advertisement
Sign in to follow this  
SteveHatcher

How to actually measure stuff like cache hit, prefetch, ram traversal?

This topic is 912 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi Guys,

 

This question is mostly out of pure interest.

 

Reading a lot of topics on this forum about game engine design and code, I often come across posts that say something like "this way you are more likely to load the next bit of data into the memory that the cpu will need, increasing cache coherency...", usually in topics about data oriented design (DOD).

 

From my understanding, DOD is something to do with stuff being in contiguous memory that the cpu cycles over each game loop, and it seems easier to achieve this using DOD rather than inheritance?

 

Secondly... how can you actually measure this stuff? I know in a very very simply game engine it doesn't really matter, but I have a learning game written using both ECS and just OOP that essentially does the same thing. (sprite moving around, shoots, gets shot at by enemies). How can I compare which implementation is actually better, and what areas of each one are working well?

 

I hope my question makes sense.

 

Thanks

Share this post


Link to post
Share on other sites
Advertisement

In regards to your, how do you measure the stuff comment:

 

If this is purely out of interest, you will find your answer is platform dependent.  A great starting point is the Intel Software Tuning, Performance Optimization, and Platform Monitoring forums assuming you are on an Intel platform.  If you spend some time on the forum, you'll find it pretty amazing at some of the work that goes into measuring these variables.  It is a great place to ask these kinds of questions but also the how do I fix it kind.

Share this post


Link to post
Share on other sites

VTune is potentially zero-cost for academics

Decided to try because apparently it is free for students. Got BSOD the moment I clicked "Start Analysis". Guess I'm not using it ever again.

Edited by Zaoshi Kaba

Share this post


Link to post
Share on other sites

Mistrust any advice about DoD or performance who use the term cache coherency in that context, ...

 

That's a good point, I'm sure I've made that mistake myself, and I know I've seen it made numerous times in articles and forum threads alike. I seems like a word that might mean what they meant, but its not. I guess something like cache locality is more apt, or perhaps temporal locality more generally.

 

To me, data-oriented-design boils down to two things -- firstly, the understanding that memory accesses (specifically, cache-line loads and stores to main memory), not CPU cycles, are the true bottleneck in most heavy computations, and secondly, following from the first, that the shape and flow of performance-critical data should be the prime concern, even above algorithmic, micro-optimization, and dogmatic object-oriented design considerations. What DOD is NOT, is recasting every inconsequential corner of your application to be "cache friendly".

 

DoD is often cast in opposition to classical OOP techniques because OOP places a lot of emphasis on human-centered object modelling, and its true that you have to cut against the grain of much OOP programming advice, but OOP is still a valuable tool in implementing DoD architecture, and in the rest of your program. 

Edited by Ravyne

Share this post


Link to post
Share on other sites


That's a good point, I'm sure I've made that mistake myself, and I know I've seen it made numerous times in articles and forum threads alike. I seems like a word that might mean what they meant, but its not. I guess something like cache locality is more apt, or perhaps temporal locality more generally.

I think I've heard people use the term "coherent access's" maybe it stems from that?

Share this post


Link to post
Share on other sites

 


That's a good point, I'm sure I've made that mistake myself, and I know I've seen it made numerous times in articles and forum threads alike. I seems like a word that might mean what they meant, but its not. I guess something like cache locality is more apt, or perhaps temporal locality more generally.

I think I've heard people use the term "coherent access's" maybe it stems from that?

 

 

That is a related topic, yes.

 

 

 

Understanding exactly how memory works in modern computers is a fairly big topic.

 

Decades ago it was easy enough: You had main memory and you had a processor.  There was no cache. If you needed something it was fetched from memory.

 

Then they added a cache. Early caches were a few bytes.  Then 16 bytes, then 32 bytes, then bigger and bigger.

 

Then there were more levels of caches. You could buy dedicated external cache that logically sat between your real main memory and your CPU.

 

Then it became popular to add a second CPU. And more cache chips. Cache had both CPU-integrated and external chips.

 

Then each CPU gained additional levels of cache that needed to be kept coherent.

 

 

 

These days you have multiple levels of cache feeding in to potentially multiple physical processors that all have their own caches, feeding into potentially multiple virtual processors that also potentially have caches.   Then inside the processor all the instructions are broken down and reordered anyway, and the CPU will predict where your memory accesses will be so it can prefetch them before the instructions are fully decoded. Modern hardware does lots of amazing things.

 

Any time you modify something somewhere all the other caches that know about the value need to be updated to hold the right value. If you update processor 3's data cache for a memory address it needs to eventually go out to any other processors and caches that also have that memory address. 

 

 

 

Good data-oriented design means understanding where all the copies of the object are, trying to minimize copies so ideally it is a single chain directly from main memory and not copies spread to every processor, and trying to ensure data is always available in the correct cache when you need it. In addition to solid software development background and understanding data structures and algorithms, that also means a good understanding of physical hardware and the hardware configurations that seem to change a little with every new hardware generation.

Share this post


Link to post
Share on other sites

I've heard so far getting data from a "foreign" cache takes as long as going to main memory.  I used to think the MOESI protocol would accelerate such access's but I guess not.

Share this post


Link to post
Share on other sites


That's a good point, I'm sure I've made that mistake myself, and I know I've seen it made numerous times in articles and forum threads alike. I seems like a word that might mean what they meant, but its not. I guess something like cache locality is more apt, or perhaps temporal locality more generally.

 

Spatial locality, actually.  Which I think goes a long way towards proving how complicated the whole subject really is to most programmers. :)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!