# using inline ASM & SSE

This topic is 5123 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I’m having a lot of second thoughts about using inline ASM. I have this code that enumerates support for SSE.
// ----------------------------------------------------------------------------
// Name		: EnumCPUInfo( )
// Desc		: Enumerates the CPU for special MMX / 3DNow! technologies
// ----------------------------------------------------------------------------
BOOL EnumCPUInfo( CPUINFO Info )
{
CHAR*				pStr = Info.szVendor;
int					n	 = 1;
int*				pn	 = &n;

memset( &Info, 0, sizeof( Info ) );

__try
{
_asm xorps xmm0, xmm0
}
__except ( EXECEPTION_EXECUTE_HANDLER )
{
if ( _exception_code( ) == STATUS_ILLEGAL_INSTRUCTION )
{
return ( FALSE );
}
}

__try
{
__asm
{
mov	eax, 0
CPUID

mov esi, pStr
mov [esi], ebx
mov [esi+4], edx
mov [esi+8], ecx

mov eax, 1
CPUID

test edx, 04000000h
jz	 __NOSSE2
mov	[Info.szVendor], 1

__NOSSE2:	test edx, 02000000h
jz	 __NOSSE
mov [Info.bSSE], 1

__NOSSE:	test edx, 00800000h
jz	 __EXIT1
mov [Info.bMMX], 1

__EXIT1:
}
}

__except( EXCEPTION_EXECUTE_HANDLER )
{
if ( _exception_code( ) == STATUS_ILLEGAL_INSTRUCTION )
{
return Info;
}

return Info;
}

__asm
{
mov eax, 80000000h
CPUID

cmp	eax, 80000000h
jz	__EXIT2
mov [Info.bEXT], 1

__EXIT2:
}

if ( ( strncmp( Info.szVendor, "GenuineIntel", 12 ) == 0 ) && Info.bEXT )
{
__asm
{
mov eax, 1
CPUID
mov	esi, pn
mov [esi], ebx
}

int m = 0;

memcpy( &m, pn, sizeof( char ) );

n = m;
}
else if ( ( strncmp( Info.szVendor, "AuthenticAMD", 12 ) == 0 ) && Info.bEXT )
{
__asm
{
mov	eax, 1
CPUID
mov esi, pn
mov [esi], eax

mov eax, 0x80000001
CPUID

test edx, 0x40000000
jz	 __AMD1
mov  [Info.b3DNow], 1

__AMD1:		test edx, 0x00400000
jz   __AMD2
mov	 [Info.bMMXEX ], 1

__AMD2:
}
}
else
{
if ( Info.bEXT )
;
else
;
}

Info.szVendor[ 13 ] = '\0';

return Info;
}


The code checks to see if the OS and CPU supports SSE and what not. I have to write different code for each type of SSE. What a load of work. Is it work it. Do I really get that much speed out of using these extensions? I need to make the engine fast as possible. Also, the engine uses it’s own types, not Direct3Ds. Because a lot of the sprites and meshes work with the engines types. The only conversion they need is to be uses with the VB. I can’t decide if I want to use ASM with a lot of the math. I need most the math to be lightning fast because it will be uses extensively by the engine. Most of the math is done with 3D or 4D vectors types. So what do you guys think?

##### Share on other sites
That you should use a profiler, you can potentialy get a 4x speedup using SSE but there's a fairly large chance you can tell your compiler to use it already. VC++.Net has both intrensincs for SSE and can make use of SSE if availble just you tell the optimzer to.

##### Share on other sites
Personally I would write a single c version and use VectorC to produce separate versions of the library (eg. a set of DLLs) for each instruction set. It's still a lot of work though, and an expensive compiler to buy..
Maybe you could write a single SSE 1 version and hope to get most of the performance benefits anyway?

##### Share on other sites
check out xmmintrin.h if you would rather not write all the asm yourself. Intel provides instrinsic data types and functions that can make it a little easier to deal with. Also, keep in mind that for the best results you must have 16-byte aligned memory, which is a huge pain when your dealing with aligned member variables, especially in cases of inheritance. Basically, no matter how much you try sometimes, you can't force alignment (at least not on the stack - heap memory is easier to align). The unaligned operations are not much faster than compiler-optimized fpu operations.

Long story short, SSE can speed your math up, but it can definately slow your development down.

-Joe

##### Share on other sites
I have so many different types of games and engines I want to create. So I figured if I just make one engine that is reasonably compatible with all of what I want to do then I could just worry about the algorithms and other stuff uses. I wouldn’t have to worry about an underlying framework any more either. So If I want to build an engine I need it to be fast as I can get it. I don’t really want to code huge upon huge amounts of code just for a little performance increase.

I think if I just write the code that supports each SSE for what I want to accomplish, then I can just use the framework with out worrying about—“Should I use SSE here for this?” kind of thing, because the inner math will use the SSE if it is there or not.

##### Share on other sites
Quote:
 Original post by sakkyI have so many different types of games and engines I want to create. So I figured if I just make one engine that is reasonably compatible with all of what I want to do then I could just worry about the algorithms and other stuff uses. I wouldn’t have to worry about an underlying framework any more either. So If I want to build an engine I need it to be fast as I can get it. I don’t really want to code huge upon huge amounts of code just for a little performance increase. I think if I just write the code that supports each SSE for what I want to accomplish, then I can just use the framework with out worrying about—“Should I use SSE here for this?” kind of thing, because the inner math will use the SSE if it is there or not.

The simplest, and most correct answer, is profile first.

While SSE and other instruction sets can provide boosts in performance, you must always remember the 80/20 rule: 80% of the programs time will be spent in 20% of the code. Unless you profile your application, you will never know where that 20% is, and will be blindly optimizing. Which is a waste of time.

Secondly: Most decent compilers on the market can generate code for MMX, SSE, and SSE2 if you set the flags.

##### Share on other sites
How would one do this with Visual C++ .NET?

##### Share on other sites
Quote:
 Original post by Washu80/20 rule

80/20 rule? I've heard of the 90/10 rule...

OP: If you have to ask if it's worth it, the most likely answer is: no. As Washu said, profile first - THEN worry about optimizations. You do NOT need to make the engine fast as possible FIRST - you need to get it working, mantainable, and bug free(ish) first. Then you need to find out where you need to work in order to make it fast as possible - aka, you need to profile. That way, you speed it up as quickly as possible, thus complying with the phrase "as fast as possible". Otherwise, you will probably be optimizing a section that dosn't need it - and it won't be "as fast as possible" because you could have worked on a spot that needed it more.

##### Share on other sites
So that’s what you meant. I though you meant “profile” in terms of projects options with Visual Studio. I guess it makes cense not to optimize something that you don’t know if it needs its or not. I guess I’m just a worry wart some times.

1. 1
Rutin
29
2. 2
3. 3
4. 4
5. 5

• 13
• 13
• 11
• 10
• 13
• ### Forum Statistics

• Total Topics
632960
• Total Posts
3009475
• ### Who's Online (See full list)

There are no registered users currently online

×