How to actually measure stuff like cache hit, prefetch, ram traversal?

Started by
12 comments, last by WoopsASword 8 years, 3 months ago
I'm probably guilty of misspeaking and using "coherent" instead of "contiguous" before... And i actually have written a lot of code for platforms where HW-managed memory coherence is either missing or optional - a situation where you do have to understand coherency problems and perform tasks like manual cache invalidation and memory fences and selection between different busses :lol:
Advertisement


Spatial locality, actually. Which I think goes a long way towards proving how complicated the whole subject really is to most programmers.

Right you are here. Corrected again for my imprecise words. Must've been my low upbringing.

Spacial locality = nearby (preferably same cache lane)

Temporal locality = finishing up with the cache line(s) before something else causes it to be evicted.

And then there's access-patterns and the prefetcher which, used well, is as good as having an additional last-level cache.

throw table_exception("(? ???)? ? ???");

And then there is Delorian locality, where once you read 88 bytes the cache is instantly filled with data from 1985.

Hi Guys,

This question is mostly out of pure interest.

Reading a lot of topics on this forum about game engine design and code, I often come across posts that say something like "this way you are more likely to load the next bit of data into the memory that the cpu will need, increasing cache coherency...", usually in topics about data oriented design (DOD).

From my understanding, DOD is something to do with stuff being in contiguous memory that the cpu cycles over each game loop, and it seems easier to achieve this using DOD rather than inheritance?

Secondly... how can you actually measure this stuff? I know in a very very simply game engine it doesn't really matter, but I have a learning game written using both ECS and just OOP that essentially does the same thing. (sprite moving around, shoots, gets shot at by enemies). How can I compare which implementation is actually better, and what areas of each one are working well?

I hope my question makes sense.

Thanks

1) DOD is not about "stuff being in contiguous memory". It's about designing your models around your data rather than around your "object".

Take a look at this example:
http://knight666.com/blog/tutorial-a-practical-example-of-data-oriented-design/

2) To properly measure it you need knowledge about matrices and the right tools to use it.

One can identify cache misses because of slow operations which may rise CPU activity or memory loading operations.

However, there are performacne matrices for cache misses as well.

I advise you to jump to google and learn about performance monitor in Windows.

Linux has different tools for monitoring performance.

This topic is closed to new replies.

Advertisement