using inline ASM & SSE

Started by
7 comments, last by sakky 19 years, 6 months ago
I’m having a lot of second thoughts about using inline ASM. I have this code that enumerates support for SSE.

// ----------------------------------------------------------------------------
// Name		: EnumCPUInfo( )
// Desc		: Enumerates the CPU for special MMX / 3DNow! technologies
// ----------------------------------------------------------------------------
BOOL EnumCPUInfo( CPUINFO Info )
{
	CHAR*				pStr = Info.szVendor;
	int					n	 = 1;
	int*				pn	 = &n;

	memset( &Info, 0, sizeof( Info ) );

	__try
	{
		_asm xorps xmm0, xmm0
	}
	__except ( EXECEPTION_EXECUTE_HANDLER )
	{
		if ( _exception_code( ) == STATUS_ILLEGAL_INSTRUCTION )
		{
			return ( FALSE );
		}
	}

	__try
	{
		__asm
		{
			mov	eax, 0
			CPUID

			mov esi, pStr
			mov [esi], ebx
			mov [esi+4], edx
			mov [esi+8], ecx

			mov eax, 1
			CPUID

			test edx, 04000000h
			jz	 __NOSSE2
			mov	[Info.szVendor], 1

__NOSSE2:	test edx, 02000000h
			jz	 __NOSSE
			mov [Info.bSSE], 1

__NOSSE:	test edx, 00800000h
			jz	 __EXIT1
			mov [Info.bMMX], 1

__EXIT1:	
		}
	}

	__except( EXCEPTION_EXECUTE_HANDLER )
	{
		if ( _exception_code( ) == STATUS_ILLEGAL_INSTRUCTION )
		{
			return Info;
		}

		return Info;
	}

	__asm
	{
			mov eax, 80000000h
			CPUID

			cmp	eax, 80000000h
			jz	__EXIT2
			mov [Info.bEXT], 1

__EXIT2:
	}

	if ( ( strncmp( Info.szVendor, "GenuineIntel", 12 ) == 0 ) && Info.bEXT )
	{
		__asm
		{
			mov eax, 1
			CPUID
			mov	esi, pn
			mov [esi], ebx
		}

		int m = 0;

		memcpy( &m, pn, sizeof( char ) );

		n = m;
	}
	else if ( ( strncmp( Info.szVendor, "AuthenticAMD", 12 ) == 0 ) && Info.bEXT )
	{
		__asm
		{
			mov	eax, 1
			CPUID
			mov esi, pn
			mov [esi], eax

			mov eax, 0x80000001
			CPUID

			test edx, 0x40000000
			jz	 __AMD1
			mov  [Info.b3DNow], 1 

__AMD1:		test edx, 0x00400000
			jz   __AMD2
			mov	 [Info.bMMXEX ], 1

__AMD2:		
		}
	}
	else
	{
		if ( Info.bEXT )
			;
		else
			;
	}

	Info.szVendor[ 13 ] = '\0';

	return Info;
}

The code checks to see if the OS and CPU supports SSE and what not. I have to write different code for each type of SSE. What a load of work. Is it work it. Do I really get that much speed out of using these extensions? I need to make the engine fast as possible. Also, the engine uses it’s own types, not Direct3Ds. Because a lot of the sprites and meshes work with the engines types. The only conversion they need is to be uses with the VB. I can’t decide if I want to use ASM with a lot of the math. I need most the math to be lightning fast because it will be uses extensively by the engine. Most of the math is done with 3D or 4D vectors types. So what do you guys think?
Take back the internet with the most awsome browser around, FireFox
Advertisement
That you should use a profiler, you can potentialy get a 4x speedup using SSE but there's a fairly large chance you can tell your compiler to use it already. VC++.Net has both intrensincs for SSE and can make use of SSE if availble just you tell the optimzer to.
HardDrop - hard link shell extension."Tread softly because you tread on my dreams" - Yeats
Personally I would write a single c version and use VectorC to produce separate versions of the library (eg. a set of DLLs) for each instruction set. It's still a lot of work though, and an expensive compiler to buy..
Maybe you could write a single SSE 1 version and hope to get most of the performance benefits anyway?
check out xmmintrin.h if you would rather not write all the asm yourself. Intel provides instrinsic data types and functions that can make it a little easier to deal with. Also, keep in mind that for the best results you must have 16-byte aligned memory, which is a huge pain when your dealing with aligned member variables, especially in cases of inheritance. Basically, no matter how much you try sometimes, you can't force alignment (at least not on the stack - heap memory is easier to align). The unaligned operations are not much faster than compiler-optimized fpu operations.

Long story short, SSE can speed your math up, but it can definately slow your development down.


-Joe
I have so many different types of games and engines I want to create. So I figured if I just make one engine that is reasonably compatible with all of what I want to do then I could just worry about the algorithms and other stuff uses. I wouldn’t have to worry about an underlying framework any more either. So If I want to build an engine I need it to be fast as I can get it. I don’t really want to code huge upon huge amounts of code just for a little performance increase.

I think if I just write the code that supports each SSE for what I want to accomplish, then I can just use the framework with out worrying about—“Should I use SSE here for this?” kind of thing, because the inner math will use the SSE if it is there or not.
Take back the internet with the most awsome browser around, FireFox
Quote:Original post by sakky
I have so many different types of games and engines I want to create. So I figured if I just make one engine that is reasonably compatible with all of what I want to do then I could just worry about the algorithms and other stuff uses. I wouldn’t have to worry about an underlying framework any more either. So If I want to build an engine I need it to be fast as I can get it. I don’t really want to code huge upon huge amounts of code just for a little performance increase.

I think if I just write the code that supports each SSE for what I want to accomplish, then I can just use the framework with out worrying about—“Should I use SSE here for this?” kind of thing, because the inner math will use the SSE if it is there or not.


The simplest, and most correct answer, is profile first.

While SSE and other instruction sets can provide boosts in performance, you must always remember the 80/20 rule: 80% of the programs time will be spent in 20% of the code. Unless you profile your application, you will never know where that 20% is, and will be blindly optimizing. Which is a waste of time.

Secondly: Most decent compilers on the market can generate code for MMX, SSE, and SSE2 if you set the flags.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

How would one do this with Visual C++ .NET?
Take back the internet with the most awsome browser around, FireFox
Quote:Original post by Washu
80/20 rule

80/20 rule? I've heard of the 90/10 rule...

OP: If you have to ask if it's worth it, the most likely answer is: no. As Washu said, profile first - THEN worry about optimizations. You do NOT need to make the engine fast as possible FIRST - you need to get it working, mantainable, and bug free(ish) first. Then you need to find out where you need to work in order to make it fast as possible - aka, you need to profile. That way, you speed it up as quickly as possible, thus complying with the phrase "as fast as possible". Otherwise, you will probably be optimizing a section that dosn't need it - and it won't be "as fast as possible" because you could have worked on a spot that needed it more.
So that’s what you meant. I though you meant “profile” in terms of projects options with Visual Studio. I guess it makes cense not to optimize something that you don’t know if it needs its or not. I guess I’m just a worry wart some times.
Take back the internet with the most awsome browser around, FireFox

This topic is closed to new replies.

Advertisement