More Newbie SSE Questions (Sorry)

Started by
2 comments, last by fathom88 18 years, 1 month ago
I was playing around with my code. Part of it uses an algorithm which uses a large array table. At any rate, I found dramatic difference in performance between a debug version and a release version. I was surprised to say the least because in the past it never seemed to make much of a difference. However, I did not notice a difference between my SSE verison and pure version with VC++. Later, I noticed a difference when I increased the load (my algorithm can be ajusted). I had to triple my load before I saw a difference. Here is my question. How can these two be the same. I cut and pasted from an example. int i; float* pSource1 = pArray1; float* pSource2 = pArray2; float* pDest = pResult; for ( i = 0; i < nSize; i++ ) { //do some calcs pSource1++; pSource2++; pDest++; } //// SSE version int nLoop = nSize/ 4; __m128 m1, m2, m3, m4; __m128* pSrc1 = (__m128*) pArray1; __m128* pSrc2 = (__m128*) pArray2; __m128* pDest = (__m128*) pResult; for ( int i = 0; i < nLoop; i++ ) { //do some calcs pSrc1++; pSrc2++; pDest++; } Aren't you missing some values when you do N/4 instead of N? Does incrementing the pointer ++ mean you go to the next element in the array or the 4th. Sorry if it's stupid question. I'm a total newbie. Thanks for any replies.
Advertisement
The SSE-version processes 4 items in each iteration of the loop, so you only need n/4 iterations. If n%4!=0 you have a problem.

++pointer advances the pointer to the first position after the thing currently pointed at. As __m128 contains 4 items ++pointer advances the pointer by 4 items.
In other words, ++pointer advances the pointer sizeof(*pointer) bytes. Now, if we have an array of floats, ++pointer simply advances it to point to the next float. But here:
__m128* pSrc1 = (__m128*) pArray1;__m128* pSrc2 = (__m128*) pArray2;__m128* pDest = (__m128*) pResult;

we're effectively telling the compiler to think that the array is not made of floats, but of __m128's. This works because a __m128 internally consists of four floats - thus, sizeof(__m128) == 4 * sizeof(float) and incrementing the pointer once jumps four floats (but only one __m128) forward.
Thanks for the replies. It makes sense now.

This topic is closed to new replies.

Advertisement