// ----------------------------------------------------------------------------
// Name : EnumCPUInfo( )
// Desc : Enumerates the CPU for special MMX / 3DNow! technologies
// ----------------------------------------------------------------------------
BOOL EnumCPUInfo( CPUINFO Info )
{
CHAR* pStr = Info.szVendor;
int n = 1;
int* pn = &n;
memset( &Info, 0, sizeof( Info ) );
__try
{
_asm xorps xmm0, xmm0
}
__except ( EXECEPTION_EXECUTE_HANDLER )
{
if ( _exception_code( ) == STATUS_ILLEGAL_INSTRUCTION )
{
return ( FALSE );
}
}
__try
{
__asm
{
mov eax, 0
CPUID
mov esi, pStr
mov [esi], ebx
mov [esi+4], edx
mov [esi+8], ecx
mov eax, 1
CPUID
test edx, 04000000h
jz __NOSSE2
mov [Info.szVendor], 1
__NOSSE2: test edx, 02000000h
jz __NOSSE
mov [Info.bSSE], 1
__NOSSE: test edx, 00800000h
jz __EXIT1
mov [Info.bMMX], 1
__EXIT1:
}
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
if ( _exception_code( ) == STATUS_ILLEGAL_INSTRUCTION )
{
return Info;
}
return Info;
}
__asm
{
mov eax, 80000000h
CPUID
cmp eax, 80000000h
jz __EXIT2
mov [Info.bEXT], 1
__EXIT2:
}
if ( ( strncmp( Info.szVendor, "GenuineIntel", 12 ) == 0 ) && Info.bEXT )
{
__asm
{
mov eax, 1
CPUID
mov esi, pn
mov [esi], ebx
}
int m = 0;
memcpy( &m, pn, sizeof( char ) );
n = m;
}
else if ( ( strncmp( Info.szVendor, "AuthenticAMD", 12 ) == 0 ) && Info.bEXT )
{
__asm
{
mov eax, 1
CPUID
mov esi, pn
mov [esi], eax
mov eax, 0x80000001
CPUID
test edx, 0x40000000
jz __AMD1
mov [Info.b3DNow], 1
__AMD1: test edx, 0x00400000
jz __AMD2
mov [Info.bMMXEX ], 1
__AMD2:
}
}
else
{
if ( Info.bEXT )
;
else
;
}
Info.szVendor[ 13 ] = '\0';
return Info;
}
using inline ASM & SSE
I’m having a lot of second thoughts about using inline ASM. I have this code that enumerates support for SSE.
The code checks to see if the OS and CPU supports SSE and what not. I have to write different code for each type of SSE. What a load of work. Is it work it. Do I really get that much speed out of using these extensions?
I need to make the engine fast as possible. Also, the engine uses it’s own types, not Direct3Ds. Because a lot of the sprites and meshes work with the engines types. The only conversion they need is to be uses with the VB.
I can’t decide if I want to use ASM with a lot of the math. I need most the math to be lightning fast because it will be uses extensively by the engine. Most of the math is done with 3D or 4D vectors types.
So what do you guys think?
That you should use a profiler, you can potentialy get a 4x speedup using SSE but there's a fairly large chance you can tell your compiler to use it already. VC++.Net has both intrensincs for SSE and can make use of SSE if availble just you tell the optimzer to.
Personally I would write a single c version and use VectorC to produce separate versions of the library (eg. a set of DLLs) for each instruction set. It's still a lot of work though, and an expensive compiler to buy..
Maybe you could write a single SSE 1 version and hope to get most of the performance benefits anyway?
Maybe you could write a single SSE 1 version and hope to get most of the performance benefits anyway?
check out xmmintrin.h if you would rather not write all the asm yourself. Intel provides instrinsic data types and functions that can make it a little easier to deal with. Also, keep in mind that for the best results you must have 16-byte aligned memory, which is a huge pain when your dealing with aligned member variables, especially in cases of inheritance. Basically, no matter how much you try sometimes, you can't force alignment (at least not on the stack - heap memory is easier to align). The unaligned operations are not much faster than compiler-optimized fpu operations.
Long story short, SSE can speed your math up, but it can definately slow your development down.
-Joe
Long story short, SSE can speed your math up, but it can definately slow your development down.
-Joe
I have so many different types of games and engines I want to create. So I figured if I just make one engine that is reasonably compatible with all of what I want to do then I could just worry about the algorithms and other stuff uses. I wouldn’t have to worry about an underlying framework any more either. So If I want to build an engine I need it to be fast as I can get it. I don’t really want to code huge upon huge amounts of code just for a little performance increase.
I think if I just write the code that supports each SSE for what I want to accomplish, then I can just use the framework with out worrying about—“Should I use SSE here for this?” kind of thing, because the inner math will use the SSE if it is there or not.
I think if I just write the code that supports each SSE for what I want to accomplish, then I can just use the framework with out worrying about—“Should I use SSE here for this?” kind of thing, because the inner math will use the SSE if it is there or not.
Quote:Original post by sakky
I have so many different types of games and engines I want to create. So I figured if I just make one engine that is reasonably compatible with all of what I want to do then I could just worry about the algorithms and other stuff uses. I wouldn’t have to worry about an underlying framework any more either. So If I want to build an engine I need it to be fast as I can get it. I don’t really want to code huge upon huge amounts of code just for a little performance increase.
I think if I just write the code that supports each SSE for what I want to accomplish, then I can just use the framework with out worrying about—“Should I use SSE here for this?” kind of thing, because the inner math will use the SSE if it is there or not.
The simplest, and most correct answer, is profile first.
While SSE and other instruction sets can provide boosts in performance, you must always remember the 80/20 rule: 80% of the programs time will be spent in 20% of the code. Unless you profile your application, you will never know where that 20% is, and will be blindly optimizing. Which is a waste of time.
Secondly: Most decent compilers on the market can generate code for MMX, SSE, and SSE2 if you set the flags.
Quote:Original post by Washu
80/20 rule
80/20 rule? I've heard of the 90/10 rule...
OP: If you have to ask if it's worth it, the most likely answer is: no. As Washu said, profile first - THEN worry about optimizations. You do NOT need to make the engine fast as possible FIRST - you need to get it working, mantainable, and bug free(ish) first. Then you need to find out where you need to work in order to make it fast as possible - aka, you need to profile. That way, you speed it up as quickly as possible, thus complying with the phrase "as fast as possible". Otherwise, you will probably be optimizing a section that dosn't need it - and it won't be "as fast as possible" because you could have worked on a spot that needed it more.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement