Sign in to follow this  

Number of arrays CPU can prefetch on

This topic is 2046 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm thinking of converting an array-of-structures to a structure-of-arrays as an optimization in some SSE code. Its usually a good idea. I'm concerned though because the structure would be converted to 22ish different arrays. Is there a limit to the number of arrays that a CPU prefetcher will work on? (ie. is there a limit to the number of memory access patterns the prefetcher can remember, note that I'm *not* asking about the number of prefetches in flight)

Obviously the thing to do is try it and measure the performance difference. The problem is that I figure it will take about 3-5ish days of work to change the code around and I'm wondering if there's some hard limit on the number of prefetch prediction patterns that might mean I'm following a dead path. I thought I read something once about such a limit but I can't find any info on it now.

Concretely, (as an example) say I'm trying to parallelize 5x5 matrix inversion. Suppose I have an array of 1 million Matrix5x5 I want to invert. My question in this case is would I prefer the Matrix5x5 to be handed to me as a single array-of-structures or a structure of 25 arrays for SSE processing? eg. perhaps this would allow me to eliminate a lot of SSE shuffles by processing 4 matrices at once.

Share this post


Link to post
Share on other sites
I would download the Intel Optimization Manuals from Intel's website. There is a lot of information, but Chapter 7 (Optimizing Cache Usage) should have most of your answers.

[url="http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html"]http://www.intel.com...er-manuals.html[/url]

x64/x86 CPUs have extremely sophisticated hardware predictive prefecthing capabilities so generally you shouldn't need to explicitly prefetch data in your code. The first iteration of something can be an exception to this since code can frequently 'surprise' the hardware prefetcher and you will frequently need to prefetch much farther in advance in the code-base yourself. This is frequently not very practical and you have to eat the first L3 miss. Edited by Zoner

Share this post


Link to post
Share on other sites
I had a look through the Intel optimization manual a few days ago but I couldn't find anything specific, but you're right, maybe I should try the architecture manual instead, it might discuss such things more explicitly. And then there's the AMD manuals which might have something. The fact that the intel manual doesn't mention anything about the number of arrays/streams (that I could find) makes me think its a non-issue (or a trade secret).

Share this post


Link to post
Share on other sites
[url="http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf"]From here[/url], 2.1.5.4 (pdf).

Up to 32 streams, but only 1 forward/backward stream per memory page, depends on number of requests and many other factors. So if all your data is located close and iterated in same direction, you get one stream.

But it's not something you should optimize for, it's painfully model/microcode specific, a revision of CPU may change it.

[quote]structure of 25 arrays for SSE processing?[/quote]

Load/stores are by far the most expensive part, so minimize that.

My guess would be that two arrays, one for input, one for output would work best.

Share this post


Link to post
Share on other sites

This topic is 2046 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this