Cache Coherence and Memory Barriers

Started by
10 comments, last by Zweistein2 13 years, 3 months ago
http://en.wikipedia.org/wiki/Write_barrier
http://en.wikipedia.org/wiki/Cache_coherence

I understand it this way: "Cache coherence" is automatic in a programmers point of view and i dont have to do anything to get cache coherence. Cache coherence is managed by the hardware.

"Memory Barriers" manages, lets say the "Mainmemory coherence", and only is useful at Mainmemory and they are not used to get cache coherence.

This would also mean, that i dont need to have memory barriers, when my CPU is a multicore CPU that has a common cache for all cores as long as my data elements dont get out of cache?

anything wrong here?

lg
Matthias
Advertisement
I think it depends on the hardware/compiler.

I believe that all processors used in desktop computers have a strict memory model, although it seems not be the case for processors like intel itaniums who have a weak memory model.

It should be safe to say that cache coherency is not a concern from a developer point of view. That doesn't mean a developer shouldn't be aware of cache contention performance hit.

Memory barriers are useful to prevent instructions reordering and out of order execution. Especially in the context of lock less programing.
yeah, i agree.
Quote:Original post by Eddycharly
It should be safe to say that cache coherency is not a concern from a developer point of view.That doesn't mean a developer shouldn't be aware of cache contention performance hit.


That applies to business IT programmers. Games programmers should be more than aware of the effects of cache coherency, cache misses typically cost more in performance than anything else (except vector:float load-hit stalls).
It also has a profound effect on the Cell architecture of the ps3 where the SPUs have a very very very limited amount of memory, and so the size of structures is important.
Quote:
This would also mean, that i dont need to have memory barriers, when my CPU is a multicore CPU that has a common cache for all cores as long as my data elements dont get out of cache?


No, memory barriers are useful to guarantee ordering and visibility. And usually you'll have to ensure both (and atomicity) in multithreaded programs -- shared cache or not. That's the reason (e.g.) a Mutex in general has a build-in memory barrier. Simple example:
thread 1: read x into registerthread 2: read x, change it, write it backthread 1: oops, still has old x in register and doesn't know it should reload it


But (from my experience) you don't need to worry about this in general, only if you're using low-level synchronization.

EDIT: Ok, maybe I didn't understand the question. If you were only asking about hardware barriers I guess you are right.
This might help.

Making Pointer-Based Data Structures Cache Conscious,Trishul M. Chilimbi, Mark D. Hill, and James R. Larus,IEEE Computer, December 2000. ftp://ftp.cs.wisc.edu/wwt/computer00_conscious.pdf
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Hey,

But is the code from Macnihilist not a Problem of cache coherence? thread 1 is using a cached value and it does not know that the value should be reread. I thought the mesi protocol is handling that: http://de.wikipedia.org/wiki/MESI

I ran into the same problem here:

void PetersonLock::lock(WorkerThread* w)
{
int i = w->ID;
int j = 1-i;
flag = true;
victim = i;
while(flag[j] && victim == i) { MemoryBarrier(); };
//std::cout << "in";
};

void PetersonLock::unlock(WorkerThread* w)
{
int i = w->ID;
flag = false;
//std::cout << "out";
};

When i dont put the memoryBarrier into the loop, the loop is looping forever, even when the other thread puts flag[j] to false again.
Quote:
But is the code from Macnihilist not a Problem of cache coherence? thread 1 is using a cached value and it does not know that the value should be reread. I thought the mesi protocol is handling that: http://de.wikipedia.org/wiki/MESI

No, thread 1 and 2 can use the same cache and the problem still occurs. MESI only ensures that caches are coherent, i.e. they do not return different values for the same address. Reloading data from memory is the responsibility of the CPU. The reasoning is that the code only sees the boundary CPU <-> memory and don't has to care for a cache hierarchy. (Expect for performance reasons.) The other way around the cache doesn't have to care what the CPU has in it's registers.

Regarding the code: Not sure I understand it correctly, but it seems to be the problem I described. One thread doesn't see the changes made by another, because it has the data stored in a register. It has nothing to do with cache coherence.
Cache coherence would be an additional concern if the threads were running on different CPUs and wouldn't share a cache. Then the data in one cache would be outdated and this cache would have to pull in the correct data first from its neighbor.

To summarize there are two problems here:
1. Force CPU to reload data from memory (or to store to memory) (code has to do this)
2. Ensure the the loaded data (possibly loaded from cache) is the correct one (if it has been modified by another CPU) (hardware usually does this on SMPs, e.g with MESI)

Disclaimer: My knowledge about this stuff is a bit rusty, but I think I got it right overall...
yeah ok, i think i start understanding it, too.
Quote:Original post by LessBread
This might help.

Making Pointer-Based Data Structures Cache Conscious,Trishul M. Chilimbi, Mark D. Hill, and James R. Larus,IEEE Computer, December 2000. ftp://ftp.cs.wisc.edu/wwt/computer00_conscious.pdf


Looks like the link only works as http now.

This topic is closed to new replies.

Advertisement