Jump to content

  • Log In with Google      Sign In   
  • Create Account

FADD vs FMUL time


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
5 replies to this topic

#1 nimrodson   Members   -  Reputation: 275

Like
0Likes
Like

Posted 17 April 2013 - 06:24 PM

Hi,

 

I've been some comparisons in C between the 4 basic arithmetic operations (+ , - , * , / ), and surprisingly (for me), add and multiply operations takes the same time: I did the work testes using int and doubles data types and it's the same thing.

Analizing the dissamble code generated by gcc (-S parameter) I noted that the opcodes used are fadd and fmul. According wikipedia, x87 FPU in Athlon 64 employs the same time processing both opcodes.

I'd like to know what is the reason of this curiosity.

 

Thanks.

 

 


Edited by nimrodson, 17 April 2013 - 07:01 PM.


Sponsor:

#2 Khatharr   Crossbones+   -  Reputation: 3038

Like
1Likes
Like

Posted 17 April 2013 - 07:12 PM

A lot of time and effort (and die space) spent on optimizing fmul.

Are you asking for the specifics of the floating point ALU? That's getting pretty deep, man. I'd be curious to see it if anyone has access to that info.
void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

#3 Hodgman   Moderators   -  Reputation: 31804

Like
4Likes
Like

Posted 17 April 2013 - 07:51 PM

Any operation on a floating point number is complicated -- both addition and multiplication basically require steps that add and multiply or shift the component parts of the float.
There's fixed-function hardware that's hard-wired to perform each of these operations, which it turns out can be implemented with similiar time constraints. A lot of operations can be hard wired to complete in a single clock cycle, if you throw enough transistors at it.

#4 nimrodson   Members   -  Reputation: 275

Like
0Likes
Like

Posted 17 April 2013 - 08:32 PM

Are you asking for the specifics of the floating point ALU? That's getting pretty deep, man. I'd be curious to see it if anyone has access to that info.

 

Maybe: My interest lies on to know the hardware-algorithmics aspects behind the add and mul operations, regardless if those operations are performed in the FPU or not.



#5 nimrodson   Members   -  Reputation: 275

Like
0Likes
Like

Posted 17 April 2013 - 08:38 PM


A lot of operations can be hard wired to complete in a single clock cycle, if you throw enough transistors at it.

 

FDIV operation still be too slow compared to FADD-FMUL. That means FDIV requires too much transistors to approach to FADD-FMUL times?



#6 Hodgman   Moderators   -  Reputation: 31804

Like
3Likes
Like

Posted 17 April 2013 - 08:43 PM

Yes, it's more efficiently implemented with an iterative algorithm, where each clock cycle performs one iteration.
[edit] internally, the algorithm of couse has to use integer division
http://stackoverflow.com/questions/8401194/the-integer-division-algorithm-of-x86-processors
[/edit]

Note that CPUs often have some kind of RCP op, which very quickly computes an approximation to 1/x, rather than y/x. Sometimes 'close enough' results are ok (e.g. In graphics), where you'd use y*rcp(x) instead of y/x.

You can find the human-readable algorithms by searching for "floating point multiplication", etc, and the format's layout is on Wikipedia. I'm not sure about finding details about what the logic-gate/transistor diagrams would look like... the most advanced thing I've drawn in hardware diagrams is an integer adder ;-)
Maybe the famous "what every programmer should know about floating point" document would be illuminating?




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS