Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

OklyDokly

How fast really is the VC++ inline assembler?

This topic is 5252 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi I''m in the process of learning assembler to help speed up my maths library, and I''m having some problems getting the VC++ inline assembler up to full speed (VC++.NET 7). I''m using the FPU and I''m finding that this:
float f = 0.0f;
for (int i = 0; i < 10000; i ++)
{
result = (float)sqrt( (double)f );
f++;
}
is about 1.5 times faster than this:
for (int i = 0; i < 10000; i ++)
{
__asm
{
fmul ST(1) ST(0);
}
}
Does anyone understand why? Are there any flags I have to set in the project variables to optimise __asm somehow? Thanks

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
I don''t quite understand how you''re comparing the code - they do completely different things!

The C code does the square root of a number that is increased per iteration (incidentally a good optimiser could probably optimise this away if it is aware that sqrt {it''d probably have to deal with sqrt as an intrinsic} is a unique operation [as in 1 to 1 mapping always] - if it did that it could probably calculate the whole loop in one go) - I''ve had MSVS.NET 2003 do this kind of thing for me. BTW: someone remind me, is ++ a valid operation on a double - logically I only really like it on discrete things, eg integers or iterators - for floating point stuff I tend to prefer x += 1.0; - the optimiser should be able to take care of that in the most optimal way I''m sure.

The assembly code mutliplies together 2 fpu registers, that do not appear to loaded with any content! IIRC you should get a hardware underflow exception as the FPU has no registers allocated on the fpu stack!!

The x86 asm code that is equivelent to what you wrote in the c code''s loop is
__asm {
fld QWORD PTR [f]
fld1
fld ST(1)
fsqrt
fstp DWORD PTR [result]
fadd
fstp QWORD PTR [f]
}
// is equivelent to... (given float result & double f)

result = (float) sqrt( f );
f += 1.0;


BTW: you don''t need to put a semi-colon after each statement in asm, I believe the instruction deliminator is a new line, a semi colon indicates the start of a comment in asm.

Share this post


Link to post
Share on other sites
Using assembly to do simple math operations won''t get you much speed up over the standard C++ optimizer. Compiler developers are pretty smart and the compilers do most of the obvious optimizations for simple things. Understanding the FPU instruction set is good to know, but don''t expect to speed up simple things like x+y=z (or x*x in this case). In fact, most likely your best implementation will be slower than the optimized version your compiler will spit out. On the other hand real speed increase will come from using the MMX, SSE, SSE2, (and soon SSE3) instruction sets (I''ve regularily gotten 10x - 100x speed increases with properly written MMX and SSE assembly). Also simple comparisons are often misleading due in part to the aggressive nature of optimzing compilers. As well memory alignment and cache hit / misses contribute alot to the overall speed of execution. So testing assembly code in a threoretical case is usually useless. The best way to write good assembly is to code an algorithm as best as possible using standard C++ (or C if thats what ur using). Once you''ve tweaked the algorithm out, then try replacing it with a simplified assembly version. Once the assembly version produces correct results (can take a few trys) then start to tweak that until it flys. The speed is there if you need it, you just have to know where to look.

Share this post


Link to post
Share on other sites
Thanks for the information. I was originally trying to put John Carmack''s InvSqrt() code into assembly, as I was curious to see if I could speed it up this way. I got half way through writing this when I realised I had made it 3-4 times as slow as performing 1 / sqrt(). The original C version of the InvSqrt code was a lot faster than when I converted half of it into assembly.

So I developed a quick test where I multiplied two empty floating point registers, as I was wondering the difference in speed. Performing the assembly loop at the top of this post actually seems about the same speed as the InvSqrt function, which surely shouldn''t be the case. I understand now why sqrt() would be faster, due to MMX and SSE instructions, but I''m still wondering why my InvSqrt function slows down when I try converting it to assembly

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Hmm, as far as I remember there''s a post with the code & all you need to do is copy & paste the code - I assume thats where you got the idea from.

The ''carmack'' variant of the newton raphson method relies on the fact that everything is out of the FPU and done with the normal integer instructions... it is also highly dependant on using floats rather than doubles (it could probably be re-written to deal with doubles though).

I don''t quite understand how your test would help to be honest; to test against the carmack routine you should test it against

float result, f;
// assign values as you will...

result = 1.0f/sqrt( (double)f );


That is the equivelent code to the carmack routine & AFAIK should be slower.

Share this post


Link to post
Share on other sites
quote:
I was originally trying to put John Carmack''s InvSqrt() code into assembly, as I was curious to see if I could speed it up this way.

Uh, I don''t mean to be rude or anything, but do you really believe that yourself?
A) Carmack is after all not your average coder.
B) You''re unlikely to write faster asm than the optimizer generates in any modern compiler.

Again I don''t want to take this away from you but on modern CPUs you''re unlikely to write asm that''s faster than what the optimizer can generate.
Learning asm is a good thing™ but that it will "speed up your maths library" is very unlikely, more likely it will slow it down.

Share this post


Link to post
Share on other sites
To hammer the point in, realize that Carmack has been coding ASM for a long, long time, particularly in the realm of extremely fast graphics calculations. He did almost nothing else during the DOS era. He knows how to optimize code, and he''s really ****ing good at it.

Share this post


Link to post
Share on other sites
quote:
Original post by amag
Learning asm is a good thing™ but that it will "speed up your maths library" is very unlikely, more likely it will slow it down.
[sarcasm]The same case as when you first learn VB and you will complain how slow it is. When you first learn ASM you will notice how slow it is compared to your C++ code, but since this is the ASM, you won't complain that it's slow.[/sarcasm]

In essence, every language has its own way of optimization.

[edited by - alnite on March 30, 2004 6:49:50 PM]

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!