More Newbie SSE Questions (Sorry)
I was playing around with my code. Part of it uses an algorithm which uses a large array table. At any rate, I found dramatic difference in performance between a debug version and a release version. I was surprised to say the least because in the past it never seemed to make much of a difference. However, I did not notice a difference between my SSE verison and pure version with VC++. Later, I noticed a difference when I increased the load (my algorithm can be ajusted). I had to triple my load before I saw a difference. Here is my question. How can these two be the same. I cut and pasted from an example.
int i;
float* pSource1 = pArray1;
float* pSource2 = pArray2;
float* pDest = pResult;
for ( i = 0; i < nSize; i++ )
{
//do some calcs
pSource1++;
pSource2++;
pDest++;
}
//// SSE version
int nLoop = nSize/ 4;
__m128 m1, m2, m3, m4;
__m128* pSrc1 = (__m128*) pArray1;
__m128* pSrc2 = (__m128*) pArray2;
__m128* pDest = (__m128*) pResult;
for ( int i = 0; i < nLoop; i++ )
{
//do some calcs
pSrc1++;
pSrc2++;
pDest++;
}
Aren't you missing some values when you do N/4 instead of N? Does incrementing the pointer ++ mean you go to the next element in the array or the 4th. Sorry if it's stupid question. I'm a total newbie. Thanks for any replies.
The SSE-version processes 4 items in each iteration of the loop, so you only need n/4 iterations. If n%4!=0 you have a problem.
++pointer advances the pointer to the first position after the thing currently pointed at. As __m128 contains 4 items ++pointer advances the pointer by 4 items.
++pointer advances the pointer to the first position after the thing currently pointed at. As __m128 contains 4 items ++pointer advances the pointer by 4 items.
In other words, ++pointer advances the pointer sizeof(*pointer) bytes. Now, if we have an array of floats, ++pointer simply advances it to point to the next float. But here:
we're effectively telling the compiler to think that the array is not made of floats, but of __m128's. This works because a __m128 internally consists of four floats - thus, sizeof(__m128) == 4 * sizeof(float) and incrementing the pointer once jumps four floats (but only one __m128) forward.
__m128* pSrc1 = (__m128*) pArray1;__m128* pSrc2 = (__m128*) pArray2;__m128* pDest = (__m128*) pResult;
we're effectively telling the compiler to think that the array is not made of floats, but of __m128's. This works because a __m128 internally consists of four floats - thus, sizeof(__m128) == 4 * sizeof(float) and incrementing the pointer once jumps four floats (but only one __m128) forward.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement