Assembly : When is it worth your time?

OpenGL_Guru · 2004-06-13T10:58:29

i have been reading many threads lately and assembly just was brought up in a previous thread that i started. since i dont program in assembly, only when i was in college, i was wondering who uses assembly and you if you do, is it really worth your time. i guess the next question is -- is when will it NOT be worth your time? we are looking at projected 5 - 7 Ghz machines in the next few years and about 10 Ghz machines not too long after that. yes with assembly you dont have to worry about buses and transferring data back and forth but when the CPU gets this fast is it really worth your while to save a few CPU cycles? i dont mean to rant against assembly , its lightning fast but was sincerely wondering when it will totally be useless or do you see it as something we will always need?(at a development level)

NeHe Productions Affiliates

Started by OpenGL_Guru May 25, 2004 02:57 PM

101 comments, last by OpenGL_Guru 19 years, 11 months ago

Red Drake

120

June 08, 2004 12:58 PM

That''s the advantige of 3DNow! to SSE - it inherits the instructions of mmx

In MMX there are quite a lot of surfling functions (as far as i know it).
Thank''s on the vocabulary info

Red Drake

120

June 08, 2004 01:19 PM

quote:Original post by Charles B

I get 18 clock cycles (Athlon, gcc, my own intrisics).

My own intrisics ?? Clarify please

Red Drake

AndyTX

807

June 08, 2004 01:42 PM

SSE also inherits all of the MMX instructions (ie. all MMX instructions work on xmm registers as well)... however I''m unaware of any shuffling instructions in MMX. The ones that they added in SSE2 are something like "pshuf" (with postfixes for data size) as far as I remember (don''t have the reference on me).

Red Drake

120

June 08, 2004 01:54 PM

quote:Original post by AndyTX
SSE also inherits all of the MMX instructions (ie. all MMX instructions work on xmm registers as well)... however I''m unaware of any shuffling instructions in MMX. The ones that they added in SSE2 are something like "pshuf" (with postfixes for data size) as far as I remember (don''t have the reference on me).

Isn''t

Shift operations
psllw/d/q Parallel shift logical left words / dwords / qwords
psraw/d Parallel shift right signed words / dwords
psrlw/d/q Parallel shift right unsigned words / dwords / qwords

or did i mis understud the word "shuffling".

Red Drake

AndyTX

807

June 08, 2004 04:14 PM

Yeah by "shuffling" I meant arbitrary moves. Ie. if you have a 4-element vector you may want another vector to be comprised of the values from it, except in a different order... perhaps {2, 1, 1, 4} instead of {1, 2, 3, 4}.

Bit shifting doesn''t really do the same thing as it won''t shift "across" elements in the vector. If each of your elements are 8-bits, an 8-bit-right-shift will zero all of your elements, NOT shift them into the adjacent locations.

Look up the "pshuf" instructions in the Intel Instruction Set Reference for diagrams and a better explanation

Red Drake

120

June 08, 2004 04:49 PM

Well there is no problem with this in 3DNow! becouse you only hawe 2 numbers per MMX regiseter - so it''s a lot easyer (at least it seams so)

Red Drake

Dredge-Master

175

June 09, 2004 12:01 AM

quote:Original post by OmniBrain
Just have to say this:

why use 95% more time when i only get 5% speed enhancement?

Think of it this way (came up in the Carmacks SQRT thread under Maths and Physics section of this forum).

The base sqrt function is 50ticks.
The one they refer to as carmacks inv sqrt on the first page is 30 ticks (before inversing it).
The one using the sub,add 3800000h trick is about 15ticks
The routine that I use that can only be written in assembly without significantly increasing it''s time is only 8ticks.
That''s an 85% increase in speed, not just 5%.
That''s not using SSE or 3DNow!
True, if you are only going to gain 5% increase, rearrange your code, but lets say in the example posted above, I unrolled the code, then removed half the memory accesses in assembly. You can''t remove all those extra memory calls with asm (the compiler will wack them back in again).

The other advantage of assembly is using the 3DNow! instructions.

I mean a sqrt that is 50 times faster than the normal sqrt? Divides, Adds, subtracts and multiples that are the same speed? (ie fast divides) Even faster invsqrt approximations?
Assembly only baby.

Beer - the love catalystgood ol' homepage

Anonymous

June 09, 2004 12:34 AM

IMHO ASM is only useful when your writting device drivers or a library which needs to save every clock cycle it can.

ASM is like regex it''s ummm...easy??? :S to write, but hellishly tiresome and annoying to read and understand, even if it''s your own code.

I used assembly maybe 2 times...once when I was learning it I created a tic tac toe game

the second time was for C++ code profiling class.

I personally feel ASM is now only an educational tool and should not really be considered as a development language.

Cheers

Charles B

863

June 09, 2004 06:07 AM

quote:Original post by Red Drake
quote:Original post by Charles B

I get 18 clock cycles (Athlon, gcc, my own intrisics).

My own intrisics ?? Clarify please

Example : xor_2i, add_8u8, mul_2f,
I have a layer that wraps the Intel intrisics and gcc builtins into a more standardized and compatible form. It has type checking. This is explained in my "Horse power math lib (2)" thread. I have also improved the MMX builtins of gcc by replacing them with inline asm.

This way I really code asm in C. This let''s the compiler optimize the inlined routine within the given context. This gives the full range of optimizations. For instance a better register allocation across the routines.

"Coding math tricks in asm is more fun than Java"

Charles B

863

June 09, 2004 06:16 AM

@AndyTX, Red Drake

quote:Original post by AndyTX
... however I''m unaware of any shuffling instructions in MMX.

unpack, shift, swap are in practice equivalent to the shuffle instructions of SSE. I call all these instructions swizzling instructions in the context of working on vectors of four floats. The most obvious case of swizzling in linear algebra is the cross product.

Example :

; mm0 : x y
; mm1 : z w
mov mm2, mm0
punpackldq mm0, mm1 ; x z
punpackhdq mm1, mm2 ; y w
punpackldq mm2, mm2 ; x x, might serve as a scalar later.

"Coding math tricks in asm is more fun than Java"

Assembly : When is it worth your time?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Assembly : When is it worth your time?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines