Quote:Original post by the_eddQuote:Original post by frobQuote:Original post by fpsgamer
(1) How do we ensure CPU cache coherency in the presence of context switching? For example lets say user program A executes for 10us, then user program B executes for 10us. What methods are used to prevent B from trashing the cache?
The OS and the hardware take care of this for you. If you are writing operating system code at that level, there is a lot of processor documentation and graduate-level literature that you should read.
The OS will indeed take care of this for you, but often poorly. For example if thread #1 and thread #2 are continuously updating adjacent elements array[1] and array[2], the OS and/or processor may take locks in the various cache levels to make sure that the writes are seen by all processors, even if they only touch their respective elements.
Related video of a presentation by Herb Sutter with some examples. (skip to around 1:18 for a specific example). And here's a DDJ article.
It's sad, but you really do need to know a little more about your target system to write efficient threaded code.
Pretty cool video but he takes way too long to get his point across about how you can always get more bandwidth but are limited by latency.
Just having built my own computers for the last 10 years I could've told you harddrive and memory speeds suck compared to the cpu! And how they tried to hide it with larger L1,L2 and now with AMD newest cpu's L3 cache!
Anyways, I only got halfways since I don't have like 2hrs(faster than reading the hennessy/patterson or intel books though) to watch the whole thing and he never even got around to actually doing any programming and didn't cover multicore at all. I'm guessing he just gave suggestions for keeping the pipeline full using threads or something similar in the end?