Jump to content
  • Advertisement
Sign in to follow this  
DvDmanDT

MMX, 3DNow! and SSE?

This topic is 4832 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, everyone.. I'm making a game (as most of you are), and I'm considering some optimizations.. Now, I already know that I should optimize at an algorithmic level, but lets pretend that I already have the fastest possible algo for what I'm doing, but still, my target is an AMD Duron at 700mhz, which is pretty slow considering everything I have to do each frame (besides, I'm considering this mostly for fun/learning).. So I downloaded EVEREST Home Edition, and checked out what instruction sets it can support.. It supports MMX, 3DNow! and Enhanced 3DNow!.. So my first thought was to use that enhanced stuff.. But then I checked my P4, and noticed that it doesn't support 3DNow! at all (no surprise here), but it did support MMX and SSE.. I can add that there'll be next to none floating point operations in this part of the game (it's a simulation that must run equal on all computers in a game, so I'm trying to avoid floats).. It will do lots of additions and multiplications though.. So, should I use MMX in my game, and/or should I do checks on each loop to see whether I should use 3dnow! or SSE, or should I use function pointers? Do you insist that I should forget about these (most likely silly) optimizations all together?

Share this post


Link to post
Share on other sites
Advertisement
Don't bother with MMX on P4, it performs very badly. P4 is optimal for SSE.

MMX is going to go bye-bye in the future for good. Same with 3DNow!. They are good for legacy machines (pre-Athlon XP, Pre-P3).
(edit: not entirely true, it remains for a few specific instruction from integral conversion in SSE...)

You also will not get a whole lot out of them without a thorough understanding of how to apply them. Mainly, 3Dnow! has the advantage of rcpsqrt, rcp, and it performs tight on athlon. SSE requires alignment, and you need to really read the Intel Manual a few times in order to realize that the instruction set is real crap without experience in how to use it.

Generally you won't see a huge advantage from them most of the time. They work very well in specific situations. Otherwise they are not worth the effort at all.
When it works though it is worth it.

Share this post


Link to post
Share on other sites
SSE could give you a worthwhile improvement, for durons, you can use 3dnow! with FEMMS, but for MMX, don't. please.

Share this post


Link to post
Share on other sites
Quote:
Original post by Name_Unknown
Don't bother with MMX on P4, it performs very badly. P4 is optimal for SSE.

It does? I've used it quite a bit on my P4 machine and never noticed much of a difference (compared to equivalent AMD machines and earlier Pentium generations).
Admittedly those new 128-bit instructions seems to run about as fast as two 64-bit instructions but with all the other P4 extensions to the instruction set (support for unsigned multiplications, finally!) it's hard to believe that MMX would've been abandoned.

And besides SSE can never supersede MMX since they simply don't perform the same tasks, many high-performance applications still require integer calculations.

A much easier way to take advantage of these is to use a compiler with support for the instruction set, such as VectorC. The best way of gaining performance out of multiple instruction set (at least if you intend to use them outside of selected innerloops) is to compile multiple executables, perhaps by writing a bootstrap EXE that detects the CPUs features and links in the game itself dynamically from a set of optimized DLLs. Code size is rarely a problem anyway..

Share this post


Link to post
Share on other sites
SSE - SSE3 is where it is at. Altivec for the PPC970 series.

I do not suggest going for inline ASM until your done with most of engine is done. Trust me as an expert in pre-mature optimizations :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Name_Unknown
Don't bother with MMX on P4, it performs very badly. P4 is optimal for SSE.

Don't bother posting if you don't know what you are talking about. For integer operations that can be done in parallel nothing can beat MMX to this day. P4 CPU's are notoriously slow at shifting, not much else.

If you use a lot of integer or even fixed pount calculations that can easily be done in paralell, use MMX. if you use floating point, use SIMD instructions. if you need to shift values, use an Athlon.


Quote:
Original post by Name_UnknownMMX is going to go bye-bye in the future for good. Same with 3DNow!. They are good for legacy machines (pre-Athlon XP, Pre-P3).
(edit: not entirely true, it remains for a few specific instruction from integral conversion in SSE...)

Oh really now. Please show me an official statement form Intel and AMD where they say this.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Bad Maniac
Quote:
Original post by Name_Unknown
Don't bother with MMX on P4, it performs very badly. P4 is optimal for SSE.

Don't bother posting if you don't know what you are talking about. For integer operations that can be done in parallel nothing can beat MMX to this day. P4 CPU's are notoriously slow at shifting, not much else.

If you use a lot of integer or even fixed pount calculations that can easily be done in paralell, use MMX. if you use floating point, use SIMD instructions. if you need to shift values, use an Athlon.


A curious assertion since SSE2 extends the old MMX byte/word/dword/qword instructions to 128 bit vectors, in addition to beefing up double perscision FP vectors. At least the vast majority of them, not sure if it's 100% of MMX's functionality or not.
Understandable if you were thinking SSE1 rather than SSE the whole. SSE1 is obviously not a replacement for MMX, but SSE as a whole now, SSE2 in particular is a far faster fixed poin SIMD istruction group than MMX is.
Considering P4's vetor unit has a two cycle latency for most basic integer ops regardless of their length (64 or 128 bit vectors), I'm not seeing MMX's advantage, other than legacy support.

When implemented properly like it is on P4, SEE* has a signifigant performance advantage over any other SIMD or SISD instruction group in the x86/x78 libraries. For the simple fact that they utalize longer vectors, and have approximately the same latency and throughput as anything else.


Quote:

Quote:
Original post by Name_UnknownMMX is going to go bye-bye in the future for good. Same with 3DNow!. They are good for legacy machines (pre-Athlon XP, Pre-P3).
(edit: not entirely true, it remains for a few specific instruction from integral conversion in SSE...)

Oh really now. Please show me an official statement form Intel and AMD where they say this.


Quote:
In general, 64-bit operating systems support the x87 and 3DNow!
instructions in 32-bit threads; however, 64-bit operating systems may not support x87 and 3DNow!
instructions in 64-bit threads. To make it easier to later migrate from 32-bit to 64-bit code, you may
want to avoid x87 and 3DNow! instructions altogether and use only SSE and SSE2 instructions when
writing new 32-bit code.


From the AMD software optimization manual. Page 245.
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF]

Don't have an equivelant quote from Intel handy, but they've been pushing SSE* longer and harder than AMD, they do support this move. P4 really was designed with SSE* first, MMX second and x87 a distant third in mind from the start.

XP64 for AMD64 already does not preserve the x87 FP stack on a context switch.
x87/MMX is being phased out as the market moves to 64 bit OSes / code.
(This is why there are 8 new 128 bit XMM registers and no new x87 registers in 64 bit mode).

Share this post


Link to post
Share on other sites
Going back to the original poster:

Optimization is great, but don't start early. You can afford to wait until your project's practically done, then run your algorithms through the ringer. I usually find waste in loops, redeclarations, and copying where I should be passing by reference.

That being said, the compiler you use is probably pretty good at instruction-set level optimization if you tell it to do so (GCC uses the -O# convention, you might also try -funroll-loops as that works miracles on tightly nested code). If you want to speed up code, I'd suggest checking your byte alignments in memory, especially with structs, helps a fortune with 3DNow! and Hyperthreading particularly. Lots of good documentation on Intel and AMDs websites on that. Coming from embedded platforms I rest that that's probably all you need to do but, if you absolutely want to have the fastest code on the block...

I suggest rewriting your matrix and vector routines using straight SSE code. Everyone above says "Oh noes, abandon MMX!!one1". It's not that really MMX is terrible, it's just old, and isn't bit-wide enough to do as many operations-per-clock (you can pack EIGHT-32-bit integers into a single 128-bit register in SSE, whereas only 4 would go into an MMX-64-bit register). In some of my logic design classes it was "Strongly Suggested" to me that the newer implementations on chip of MMX are simply redirects to equivalent SSE microcode. I have no evidence other than my professor's winking, though.

As for 3DNow, I really hate to say it, but there really is no reason to use it. Sure, it can afford some extra speed ups, but it only works on AMD systems, whereas equivalent SSE code will run on both platforms.

Oh, and if you're lucky/rich enough to own a G4/5, bow to your master. Altivec is amazingly better than anything the X86 world has to offer, IMHO, and I will miss it greatly as Apple moves away from IBM.

Share this post


Link to post
Share on other sites
Quote:
Original post by Bad Maniac
Don't bother posting if you don't know what you are talking about. For integer operations that can be done in parallel nothing can beat MMX to this day. P4 CPU's are notoriously slow at shifting, not much else.


No, *you* don't bother posting about what you don't know about. Google the internet for the many reasons why MMX on P4 is crap. I am not going to do your homework for you. SSE also has integer instructions. SSE2 is designed to replace MMX

SSE2

Quote:

Oh really now. Please show me an official statement form Intel and AMD where they say this.


There is no official statements other than that the 64-bit architectures are going for SSE. Do your research and GOOGLE please.

[Edited by - Name_Unknown on August 22, 2005 2:58:45 PM]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!