Back to General and Gameplay Programming

FADD vs FMUL time

General and Gameplay Programming Programming

Started by nimrodson April 18, 2013 12:24 AM

4 comments, last by Hodgman 11 years ago

nimrodson

276

Author

April 18, 2013 12:24 AM

Hi,

I've been some comparisons in C between the 4 basic arithmetic operations (+ , - , * , / ), and surprisingly (for me), add and multiply operations takes the same time: I did the work testes using int and doubles data types and it's the same thing.

Analizing the dissamble code generated by gcc (-S parameter) I noted that the opcodes used are fadd and fmul. According wikipedia, x87 FPU in Athlon 64 employs the same time processing both opcodes.

I'd like to know what is the reason of this curiosity.

Thanks.

spaceship-battle

Khatharr

8,814

April 18, 2013 01:12 AM

A lot of time and effort (and die space) spent on optimizing fmul.

Are you asking for the specifics of the floating point ALU? That's getting pretty deep, man. I'd be curious to see it if anyone has access to that info.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

Hodgman

52,717

April 18, 2013 01:51 AM

Any operation on a floating point number is complicated -- both addition and multiplication basically require steps that add and multiply or shift the component parts of the float.
There's fixed-function hardware that's hard-wired to perform each of these operations, which it turns out can be implemented with similiar time constraints. A lot of operations can be hard wired to complete in a single clock cycle, if you throw enough transistors at it.

. 22 Racing Series .

nimrodson

276

Author

April 18, 2013 02:32 AM

Are you asking for the specifics of the floating point ALU? That's getting pretty deep, man. I'd be curious to see it if anyone has access to that info.

Maybe: My interest lies on to know the hardware-algorithmics aspects behind the add and mul operations, regardless if those operations are performed in the FPU or not.

spaceship-battle

nimrodson

276

Author

April 18, 2013 02:38 AM

A lot of operations can be hard wired to complete in a single clock cycle, if you throw enough transistors at it.

FDIV operation still be too slow compared to FADD-FMUL. That means FDIV requires too much transistors to approach to FADD-FMUL times?

spaceship-battle

Hodgman

52,717

April 18, 2013 02:43 AM

Yes, it's more efficiently implemented with an iterative algorithm, where each clock cycle performs one iteration.
[edit] internally, the algorithm of couse has to use integer division
http://stackoverflow.com/questions/8401194/the-integer-division-algorithm-of-x86-processors
[/edit]

Note that CPUs often have some kind of RCP op, which very quickly computes an approximation to 1/x, rather than y/x. Sometimes 'close enough' results are ok (e.g. In graphics), where you'd use y*rcp(x) instead of y/x.

You can find the human-readable algorithms by searching for "floating point multiplication", etc, and the format's layout is on Wikipedia. I'm not sure about finding details about what the logic-gate/transistor diagrams would look like... the most advanced thing I've drawn in hardware diagrams is an integer adder ;-)
Maybe the famous "what every programmer should know about floating point" document would be illuminating?

. 22 Racing Series .

FADD vs FMUL time

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

FADD vs FMUL time

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines