Jump to content

  • Log In with Google      Sign In   
  • Create Account


#ActualVortez

Posted 20 December 2012 - 06:03 PM

Hi, i've been experimenting with some SIMD lately, like mmx and SSE/SSE2, and im a bit disapointed by the result. Im doing simple stuff like filling 2 arrays with randoms numbers, then adding the result in another array, using c++, mmx and SSE (im using inline assembly in the last 2 functions, not the intrinsic functions).

ex:
const int NumElements = 10000;
const int NumLoops = 1000;

int a[1000];
int b[1000];
int c[1000];

void CPPTest(){
	for(int i = 0; i < NumLoops; i++){
		for(int j = 0; j < NumElements; j++){
			c[j] = a[j] + b[j];
		}
	}
}


I dont have the code with me atm but that basically what i do, then i do the same for the 2 other functions but in mmx or SSE, replacing the inner loop with assembly code.

Sure, the debug version with no optimization is about 10-12 time faster with SSE,and mmx show some improvement as well, but in release mode, the mmx version is about 10% slower, and the SSE version only slighly better, maybe 5%. I have to say i was expecting better result. I also noticed that if i use smaller buffers, i get better results, if i use biggers one, the result even out. I suspect the cache is doing this.

So, that's why im asking, is it still worth it to use those instructions with a compiler so good at optimizing the code?

#3Vortez

Posted 20 December 2012 - 06:00 PM

Hi, i've been experimenting with some SIMD lately, like mmx and SSE/SSE2, and im a bit disapointed by the result. Im doing simple stuff like filling 2 arrays with randoms numbers, then adding the result in another array, using c++, mmx and SSE (im using inline assembly in the last 2 functions, not the intrinsic functions).ex:
const int NumElements = 10000;const int NumLoops = 1000;int a[1000];int b[1000];int c[1000];void CPPTest(){    for(int i = 0; i < NumLoops; i++){        for(int j = 0; j < NumElements; j++){            c[j] = a[j] + b[j];        }    }}
I dont have the code with me atm but that basically what i do, then i do the same for the 2 other functions but in mmx or SSE, replacing the inner loop with assembly code.Sure, the debug version with no optimization is about 10-12 time faster with SSE,and mmx show some improvement as well, but in release mode, the mmx version is about 10% slower, and the SSE version only slighly better, maybe 5%. I have to say i was expecting better result. I also noticed that if i use smaller buffers, i get better results, if i use biggers one, the result even out. I suspect the cache is doing this.So, that's why im asking, is it still worth it to use those instructions with a compiler so good at optimizing the code?

#2Vortez

Posted 20 December 2012 - 05:51 PM

Hi, i've been experimenting with some SIMD lately, like mmx and SSE/SSE2, and im a bit disapointed by the result. Im doing simple stuff like filling 2 arrays with randoms numbers, then adding the result in another array, using c++, mmx and SSE (im using inline assembly in the last 2 functions, not the intrinsic functions).

Sure, the debug version with no optimization is about 10-12 time faster with SSE,and mmx show some improvement as well, but in release mode, the mmx version is about 10% slower, and the SSE version only slighly better, maybe 5%. I have to say i was expecting better result. So, that's why im asking, is it still worth it to use those instructions with a compiler so good at optimizing the code?

#1Vortez

Posted 20 December 2012 - 05:50 PM

Hi, i've been experimenting with some SIMD lately, like mmx and SSE/SSE2, and im a bit disapointed by the result. Im doing simple stuff like filling 2 arrays with randoms numbers, then adding the result using c++, mmx and SSE (using inline assembly in the last 2 functions).

Sure, the debug version with no optimization is about 10-12 time faster with SSE,and mmx show some improvement as well, but in release mode, the mmx version is about 10% slower, and the SSE version only slighly better, maybe 5%. I have to say i was expecting better result. So, that's why im asking, is it still worth it to use those instructions with a compiler so good at optimizing the code?

PARTNERS