Home » Community » Forums » » SSE2 for Dummies (who know C/C++)
  Intel sponsors gamedev.net search:   
[Control Panel] [Register] [Bookmarks] [Who's Online] [Active Topics] [Stats] [FAQ] [Search]

Add Forum to Favorites |  Send Topic To a Friend | View Forum FAQ | Track this topic


 Last Thread Next Thread 
 SSE2 for Dummies (who know C/C++)
Post Reply 
quote:
Up until now, we have been using movapd to move data to and from our registers. This is much slower than the instruction movapd....



I believe you meant to use "movupd" there...


Are you sure about those figures for switching into SSE2 state on the processor? I was sure that the overhead wasn't as high as 1000-2000+ cycles... And even if it is that high - as long as you're not using it in a loop intermixed with FPU code, then who cares about 1000 cycles when we have 3 giga cycles to play with on todays high end hardware?

Note: I said "as long as you're not using it in a loop intermixed with FPU code". I do not advocate sloppy code just because we have faster processors. I'm just saying that if the switch only has to occur every once in a while, then it ain't so bad....

But it's good to see an article looking at to somewhat lost art of assembly - keep it up!

Jx



[edited by - jx on September 1, 2003 3:16:54 AM]

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Thanks for the article. I have some problems though and perhaps someone could help. Please be warned that I am a newbie in this subject.

I tried to use the code provided in MS VC++6.
Firstly the asm keyword was not recognised so I changed it to asm_ after searching MSDN.

Then VC6 started moaning that it did not recognise the opcodes.
How can I get it to recognise the SSE2 opcodes?
I have a P4 2.4 so I know that my CPU supports these instructions.

Thanks if anyone can help me out there.

Burnt neurons on toast.
Yum.

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I figured it out.
For the record: I needed to install the processor pack from MS.

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

There are a 2 errors in your article.

1. intermixing standard integer code with SSE2 code does not have a penalty at all.
Intermixing SSE/SSE2 and FPU may have an impact on performance, because both use similar units in the CPU, but do not share registers. In MMX code you had to use emms to clear your MMX registers for the fpu and the first MMX instruction encountered by the CPU after the last FPU use may impose a great performance degregation.

2. Alignment may be an issue with intel compilers, when you do not watch your step.
e.g.

struct bla
{
BYTE x;
double y;
};

may get you into trouble, because y is not padded to 16 byte boundaries. (You would loose 15 byte and compatability to all libraries if it were).

For alignment you use the following (compiler specific) keywords/functions :

_aligned_malloc( size, <alignment> ), _aligned_free( ... ) to create aligned memory.

__declspec( align( <alignment> ) ) to specfiy a structure/declaration/ ... as aligned to the value passed

These are specific for MSVC 7.1 (but should also be available in other compilers).

Thats for the errors.

If you are interested in the SSE/SSE2 optimisations the currently best source of hints is the "Intel Architecture Optimization Reference Manual".

here is the link :
http://www.intel.com/design/Pentium4/manuals/index.htm

PS: For MSVC6.0 you need the processor pack to use SSE and SSE2, but this may lead to various incompabilities with other code. (I would suggest updating to the newer version.)


 User Rating: 1015    Report this Post to a Moderator | Link

Thank you for the comments.

The reason why I have things implemented the way they were was because I was using gcc and linux. I had a very hard time figuring out how to align the data. I had no problem in VC++ 6.0, but forgot to mention how it is done in the article. Sorry.

I'm sorry about the innaccuracy with the switching times. I first found through testing that this overhead is only encountered the first time the instructions are used. This might have been due to my timing method. What have you guys found out?

I will try tp make some changes and update the article. I am not sure as to how it is done here at gamedev as this is my first article.

Thanks a lot for the corrections.

Dwiel

 User Rating: 1176   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Good article but I can't compil samples.

I have installed Visual C++ 6.0 Professionnal, Service Pack 5 and the Processor Pack.
Each time, I compil, it prompt this error:
error C2400: inline assembler syntax error in 'opcode'; found 'xmm0'
 

were the line is:
movaps xmm0, [edx]
 


I have tried to put the asm inline in a asm file and after I add the .XMM directive, it compils.

I cannot set the .XMM directive in the asm inlined, compilator complains about it.

So I think I miss something, I never see this problem before and I have installed several times the complete package in order to compil AMD or MMX instructions.

Pease help.


 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

That is odd... are you sure your CPU supports SSE2? It should compile even if it does not work on your processor, I just know that with gcc, I had to compile on a machine with SSE2 for the sse2 stuff to work. I would get nearly the exact same message if I tried to do it on a non-SSE2 computer.

from the info that you are giving me, it should be working. I am sorry I can't tell you what is going wrong. If you want to send me the files, I can try compiling for kix....

Dwiel

 User Rating: 1176   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I found it. It was incompatibility between the VC6 and SP5 language.
I have installed VC6 in french and SP5 in french on english Windows -> not work.
I reinstall VC6 in french and SP5 in english, it works. Cannot understand why, but that's it.
Thanks.

 User Rating: 1015   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

I have emailed Dave with an update on this article in which I have made a correction in the last paragraph. I had gotten some emails about it and realized my error. I claimed that by calling the SSE2 code you had to wait for the CPU to switch over to the correct mode. This is completely incorrect, as I was thinking of something else when I wrote that... I feel really bad that on my first article I made such a large error... Hopefully with the update, I will be misinforming less people... The updated version is already available at My Webpage and will hopefully be updated here ASAP.

Just wanted to let you guys know what was going on and apologize to those of you who I had mislead.

Dwiel

 User Rating: 1176   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Arise from the dead ye old thread!

I came to the article from google while learning how to do inline asm on GCC and noticed that you mension that inline asm in GCC cannot interface with local variables which is not true, you just have to use "Extended inline assembly", which is actually pretty good since it allows the compiler to optimize your inline asm by selecting registers for you if you want it to.

Anyway, if anyone is interested, here is a small guide about inline GCC assembly.

[Aeon Games][How to initailize a OpenGL window with SDL (multiplatform)][My Public CVS]
[If you rate me down, I will only become stronger. - Obi Wan]


 User Rating: 1567   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

Sweet. Thanks for the info. I'll try to get it updated ASAP. I forget where I read that you couldn't use non-global vars... I knew that it made no sence...

Thanks again

edit: typo

[Edited by - Dwiel on June 25, 2005 5:37:13 PM]


Dwiel

 User Rating: 1176   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

No Problem, its a good article


[Aeon Games][How to initailize a OpenGL window with SDL (multiplatform)][My Public CVS]
[If you rate me down, I will only become stronger. - Obi Wan]


 User Rating: 1567   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

how can i allign an array of float if i was using intel compiler

 User Rating: 1015    Report this Post to a Moderator | Link

I have a question, I am trying to complie this simple program in visual studio.net 2003 and I get this error, What should I do :
#include
#include
#include
int main()
{

_declspec(align(16)) long mul;
_declspec(align(16)) int t1[100000];
_declspec(align(16)) int t2[100000];
__m128i mul1,mul2;
for ( int j= 0 ; j <

 User Rating: 1015    Report this Post to a Moderator | Link

Concerning alignment in gcc:
Have you taken a look at http://www.cs.cmu.edu/cgi-bin/info2www?(gcc.info)Variable%20Attributes (Variable Attributes) ?
You can use them to align variables in gcc. Example:
int x __attribute__ ((aligned (16))) = 0;

Greets :)

 User Rating: 1015    Report this Post to a Moderator | Link

All times are ET (US)

Post Reply
 Last Thread Next Thread 
Forum Rules:
You may not post new threads
You may post replies
You may not edit your posts
You may not use HTML in your posts
Jump To:
Administrative Options: