• Advertisement
Sign in to follow this  

FADD vs FMUL time

This topic is 1741 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

 

I've been some comparisons in C between the 4 basic arithmetic operations (+ , - , * , / ), and surprisingly (for me), add and multiply operations takes the same time: I did the work testes using int and doubles data types and it's the same thing.

Analizing the dissamble code generated by gcc (-S parameter) I noted that the opcodes used are fadd and fmul. According wikipedia, x87 FPU in Athlon 64 employs the same time processing both opcodes.

I'd like to know what is the reason of this curiosity.

 

Thanks.

 

 

Edited by nimrodson

Share this post


Link to post
Share on other sites
Advertisement
A lot of time and effort (and die space) spent on optimizing fmul.

Are you asking for the specifics of the floating point ALU? That's getting pretty deep, man. I'd be curious to see it if anyone has access to that info.

Share this post


Link to post
Share on other sites
Any operation on a floating point number is complicated -- both addition and multiplication basically require steps that add and multiply or shift the component parts of the float.
There's fixed-function hardware that's hard-wired to perform each of these operations, which it turns out can be implemented with similiar time constraints. A lot of operations can be hard wired to complete in a single clock cycle, if you throw enough transistors at it.

Share this post


Link to post
Share on other sites

Are you asking for the specifics of the floating point ALU? That's getting pretty deep, man. I'd be curious to see it if anyone has access to that info.

 

Maybe: My interest lies on to know the hardware-algorithmics aspects behind the add and mul operations, regardless if those operations are performed in the FPU or not.

Share this post


Link to post
Share on other sites


A lot of operations can be hard wired to complete in a single clock cycle, if you throw enough transistors at it.

 

FDIV operation still be too slow compared to FADD-FMUL. That means FDIV requires too much transistors to approach to FADD-FMUL times?

Share this post


Link to post
Share on other sites
Yes, it's more efficiently implemented with an iterative algorithm, where each clock cycle performs one iteration.
[edit] internally, the algorithm of couse has to use integer division
http://stackoverflow.com/questions/8401194/the-integer-division-algorithm-of-x86-processors
[/edit]

Note that CPUs often have some kind of RCP op, which very quickly computes an approximation to 1/x, rather than y/x. Sometimes 'close enough' results are ok (e.g. In graphics), where you'd use y*rcp(x) instead of y/x.

You can find the human-readable algorithms by searching for "floating point multiplication", etc, and the format's layout is on Wikipedia. I'm not sure about finding details about what the logic-gate/transistor diagrams would look like... the most advanced thing I've drawn in hardware diagrams is an integer adder ;-)
Maybe the famous "what every programmer should know about floating point" document would be illuminating?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement