Non heap memory slow???!!!

Started by
10 comments, last by Excors 17 years, 1 month ago
Quote:Original post by Excors
As for the difference in code generation between the stack and heap versions, that seems to be due to aliasing - the compiler can tell that two stack variables are not overlapping, but it can't tell that the two pointers returned by new are not overlapping, and so it generates different code (rep movsd vs a read-write loop). If you add __restrict to the heap-allocated pointers, then it generates the same code as for the stack-allocated version.
You know, my post was initially going to be mentioning the aliasing issue, as I had noticed exactly what you state above. The only reason I didn't is because the one where it can't assume no aliasing is the one that appeared to be faster! (go figure)[totally]
Its really interesting to see what a diference __restrict makes.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Advertisement
That seems to be just because this is a pathological case - I guess the CPU assumes you're going to be copying a lot of data, so it goes to the trouble of setting up all the stuff for rep movsd, and then it only copies four bytes because of the input parameters. Presumably the straight 'read, write, decrement, loop until zero' results in much happier pipelining or prediction or whatever it is that's affecting the timings. If you run the program with a larger second parameter (number of bytes to copy), then the alias-free version (stack/__restrict, using rep movsd) becomes faster instead, which makes more sense.

I don't actually see why the compiler cares about aliasing in this case, since rep movsd ought to operate exactly as expected even when you've got overlapping input (unless I'm missing some difference). I'd assume it's part of the mysterious internals of the optimiser, which just happens to have a special case for copying unaliased memory and not for aliased memory, which it uses when it guesses it's going to be copying quite a lot of memory (more than four bytes). But I really don't know, so I'm just guessing [smile]

Anyway, this kind of benchmark doesn't give useful results, but it does give useful information like "the optimiser isn't perfect and sometimes it'll generate worse code when you don't think it should (e.g. when you add __restrict, or when you use stack allocation)", so you know what to look out for when you're profiling a real application, and if it's significantly slowed by memory copying then you can test whether the compiler happens to be generating significantly sub-optimal code in that case. (It also shows that you really do need to profile real code, because it's impossible to anticipate what the compiler's optimiser will do.)

Going back to the original question, "Non heap memory slow?", the answer is definitely no - the code does demonstrate real differences between the two programs, but you need to be very careful when analysing the results, because heap vs stack isn't the relevant issue here.

This topic is closed to new replies.

Advertisement