|
||||||||||||||||||
Add Forum to Favorites | Send Topic To a Friend | View Forum FAQ | Track this topic |
Last Thread Next Thread ![]() |
| SSE2 for Dummies (who know C/C++) |
|
![]() Jx Member since: 11/10/2000 From: United Kingdom |
||||
|
|
||||
quote: I believe you meant to use "movupd" there... Are you sure about those figures for switching into SSE2 state on the processor? I was sure that the overhead wasn't as high as 1000-2000+ cycles... And even if it is that high - as long as you're not using it in a loop intermixed with FPU code, then who cares about 1000 cycles when we have 3 giga cycles to play with on todays high end hardware? Note: I said "as long as you're not using it in a loop intermixed with FPU code". I do not advocate sloppy code just because we have faster processors. I'm just saying that if the switch only has to occur every once in a while, then it ain't so bad.... But it's good to see an article looking at to somewhat lost art of assembly - keep it up! Jx [edited by - jx on September 1, 2003 3:16:54 AM] |
||||
|
||||
![]() neuralcrisp Member since: 8/31/2003 From: New Zealand |
||||
|
|
||||
| Thanks for the article. I have some problems though and perhaps someone could help. Please be warned that I am a newbie in this subject. I tried to use the code provided in MS VC++6. Firstly the asm keyword was not recognised so I changed it to asm_ after searching MSDN. Then VC6 started moaning that it did not recognise the opcodes. How can I get it to recognise the SSE2 opcodes? I have a P4 2.4 so I know that my CPU supports these instructions. Thanks if anyone can help me out there. Burnt neurons on toast. Yum. |
||||
|
||||
![]() neuralcrisp Member since: 8/31/2003 From: New Zealand |
||||
|
|
||||
| I figured it out. For the record: I needed to install the processor pack from MS. |
||||
|
||||
![]() Anonymous Poster |
||||
|
||||
| There are a 2 errors in your article. 1. intermixing standard integer code with SSE2 code does not have a penalty at all. Intermixing SSE/SSE2 and FPU may have an impact on performance, because both use similar units in the CPU, but do not share registers. In MMX code you had to use emms to clear your MMX registers for the fpu and the first MMX instruction encountered by the CPU after the last FPU use may impose a great performance degregation. 2. Alignment may be an issue with intel compilers, when you do not watch your step. e.g. struct bla { BYTE x; double y; }; may get you into trouble, because y is not padded to 16 byte boundaries. (You would loose 15 byte and compatability to all libraries if it were). For alignment you use the following (compiler specific) keywords/functions : _aligned_malloc( size, <alignment> ), _aligned_free( ... ) to create aligned memory. __declspec( align( <alignment> ) ) to specfiy a structure/declaration/ ... as aligned to the value passed These are specific for MSVC 7.1 (but should also be available in other compilers). Thats for the errors. If you are interested in the SSE/SSE2 optimisations the currently best source of hints is the "Intel Architecture Optimization Reference Manual". here is the link : http://www.intel.com/design/Pentium4/manuals/index.htm PS: For MSVC6.0 you need the processor pack to use SSE and SSE2, but this may lead to various incompabilities with other code. (I would suggest updating to the newer version.) |
||||
|
||||
![]() Dwiel Member since: 3/13/2001 From: Bloomington, IN, United States |
||||
|
|
||||
| Thank you for the comments. The reason why I have things implemented the way they were was because I was using gcc and linux. I had a very hard time figuring out how to align the data. I had no problem in VC++ 6.0, but forgot to mention how it is done in the article. Sorry. I'm sorry about the innaccuracy with the switching times. I first found through testing that this overhead is only encountered the first time the instructions are used. This might have been due to my timing method. What have you guys found out? I will try tp make some changes and update the article. I am not sure as to how it is done here at gamedev as this is my first article. Thanks a lot for the corrections. Dwiel |
||||
|
||||
![]() Krouer Member since: 11/21/2002 From: France |
||||
|
|
||||
| Good article but I can't compil samples. I have installed Visual C++ 6.0 Professionnal, Service Pack 5 and the Processor Pack. Each time, I compil, it prompt this error: error C2400: inline assembler syntax error in 'opcode'; found 'xmm0' were the line is: movaps xmm0, [edx] I have tried to put the asm inline in a asm file and after I add the .XMM directive, it compils. I cannot set the .XMM directive in the asm inlined, compilator complains about it. So I think I miss something, I never see this problem before and I have installed several times the complete package in order to compil AMD or MMX instructions. Pease help. |
||||
|
||||
![]() Dwiel Member since: 3/13/2001 From: Bloomington, IN, United States |
||||
|
|
||||
| That is odd... are you sure your CPU supports SSE2? It should compile even if it does not work on your processor, I just know that with gcc, I had to compile on a machine with SSE2 for the sse2 stuff to work. I would get nearly the exact same message if I tried to do it on a non-SSE2 computer. from the info that you are giving me, it should be working. I am sorry I can't tell you what is going wrong. If you want to send me the files, I can try compiling for kix.... Dwiel |
||||
|
||||
![]() Krouer Member since: 11/21/2002 From: France |
||||
|
|
||||
| I found it. It was incompatibility between the VC6 and SP5 language. I have installed VC6 in french and SP5 in french on english Windows -> not work. I reinstall VC6 in french and SP5 in english, it works. Cannot understand why, but that's it. Thanks. |
||||
|
||||
![]() Dwiel Member since: 3/13/2001 From: Bloomington, IN, United States |
||||
|
|
||||
| I have emailed Dave with an update on this article in which I have made a correction in the last paragraph. I had gotten some emails about it and realized my error. I claimed that by calling the SSE2 code you had to wait for the CPU to switch over to the correct mode. This is completely incorrect, as I was thinking of something else when I wrote that... I feel really bad that on my first article I made such a large error... Hopefully with the update, I will be misinforming less people... The updated version is already available at My Webpage and will hopefully be updated here ASAP. Just wanted to let you guys know what was going on and apologize to those of you who I had mislead. Dwiel |
||||
|
||||
![]() Kwizatz GDNet+ Member since: 4/7/2000 From: San Jose, Costa Rica |
||||
|
|
||||
Arise from the dead ye old thread! ![]() I came to the article from google while learning how to do inline asm on GCC and noticed that you mension that inline asm in GCC cannot interface with local variables which is not true, you just have to use "Extended inline assembly", which is actually pretty good since it allows the compiler to optimize your inline asm by selecting registers for you if you want it to. Anyway, if anyone is interested, here is a small guide about inline GCC assembly. [Aeon Games][How to initailize a OpenGL window with SDL (multiplatform)][My Public CVS] [If you rate me down, I will only become stronger. - Obi Wan] |
||||
|
||||
![]() Dwiel Member since: 3/13/2001 From: Bloomington, IN, United States |
||||
|
|
||||
| Sweet. Thanks for the info. I'll try to get it updated ASAP. I forget where I read that you couldn't use non-global vars... I knew that it made no sence... Thanks again edit: typo [Edited by - Dwiel on June 25, 2005 5:37:13 PM] Dwiel |
||||
|
||||
![]() Kwizatz GDNet+ Member since: 4/7/2000 From: San Jose, Costa Rica |
||||
|
|
||||
No Problem, its a good article ![]() [Aeon Games][How to initailize a OpenGL window with SDL (multiplatform)][My Public CVS] [If you rate me down, I will only become stronger. - Obi Wan] |
||||
|
||||
![]() Anonymous Poster |
||||
|
||||
| how can i allign an array of float if i was using intel compiler |
||||
|
||||
![]() Anonymous Poster |
||||
|
||||
| I have a question, I am trying to complie this simple program in visual studio.net 2003 and I get this error, What should I do : #include #include #include int main() { _declspec(align(16)) long mul; _declspec(align(16)) int t1[100000]; _declspec(align(16)) int t2[100000]; __m128i mul1,mul2; for ( int j= 0 ; j < |
||||
|
||||
![]() Anonymous Poster |
||||
|
||||
| Concerning alignment in gcc: Have you taken a look at http://www.cs.cmu.edu/cgi-bin/info2www?(gcc.info)Variable%20Attributes (Variable Attributes) ? You can use them to align variables in gcc. Example: int x __attribute__ ((aligned (16))) = 0; Greets :) |
||||
|
||||
All times are ET (US)![]() |
Last Thread Next Thread ![]() |
|