Cache In A Multi-Core EnvironmentIn my previous article I discussed the use of cache and some practices that can provide increased performance while also teaching you what cache is. I also stated that cache in a multicore environment is a whole other topic so I've written this article to cover the different considerations that come along with Multicore Programming.
Why does it matter if we're using two cores?Cache comes in levels, typically 3 each with their own group of cores that can access it, L1 Cache is only visible to a single core with each core having it's own private cache and is the fastest of all caches. L2 Cache is usually visible to a group of cores, for instance the AMD 8150 shares L2 Cache between two cores and finally there's L3 Cache that is accessible to all cores and is the slowest of caches, but still much faster in comparison to RAM. Now that we know that there are different banks of cache for each core, what happens when two cores are accessing the same memory? If there was no system in place then both cores would cache the memory, then lets say one core wrote to that memory, This would be visible in memory, although the other core would still have it's cache of the old value. To solve this when a core writes to it's cached memory; Any other core that stores that cache line will be removed or updated, which is where our problem comes into play. Let's say you have 2 Integers on a single cache line and each core was writing to an integer each that are next to each other in an array, although they're not the same variables and it won't cause any unexpected results, because they're on the same cache line. Every time one core writes to that memory, the other core loses its cache. This is referred to as False Sharing and there's a simple solution to this, the hardest part is determining if you're having this problem.
False SharingFalse Sharing can hinder the performance for any program. For this example I'll go through the optimisations I did on a single producer single consumer queue and provide a few steps to solving most of your False Sharing problems. To test the queue I have two threads, one writing integers from 0 to 1 million and another reading them and checking if they're all in order. The queue doesn't undergo any resizing and is allocated with enough capacity for 1 million objects. [code] template
- Keep classes with multicore access segmented by cache lines to eliminate False Sharing
- Local variables are preferred over sharing data outside of the thread