Archived

This topic is now archived and is closed to further replies.

Fast casting

This topic is 6356 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi I need to know how to cast a float to an int using FPU code (or any other way), As I do loads of casting and I think it''s really slowing down my code. I do use fixed point math for some routines but I keep reading bad things about it and also my code does not speed up much. What do you all use for casting? Thanks alot. P.S. Anything would be appreciated thnaks.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
>What''s wrong with:
>
>int x = (int) 9.0f;
>
>Let the compiler deal with it.

What''s wrong is that it can be horribly slow, especially inside a big loop.

Share this post


Link to post
Share on other sites
Try this:

// IEEE754 compliant rounding to nearest
inline int __stdcall round( float x )
{
int t;
__asm fld x
__asm fistp t
return t;
}

When optimization is enabled, MSVC compiles this function just to 2 instructions.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Assuming he''s targeting Intel architecture

Share this post


Link to post
Share on other sites
You want it for Alpha?
        
extern "C" { int64 __asm (char *, ...); };
#pragma intrinsic(__asm)

#ifdef _ALPHA_21264

inline int round( float x )
{
return (int) __asm("cvttq f16, f0;"
"ftois f0 , v0", x);
}
#else
inline int round( float x )
{
int64 t;
__asm("cvttq f16, f0;"
"stt f0 ,(a1)", x, &t);
return int(t);
}
#endif


Edited by - Serge K on July 18, 2000 12:56:57 AM

Share this post


Link to post
Share on other sites
--== FLOAT 2 INT ==--

by: Alex Chalfin (aka Phred)
achalfin@one.net



Since the introduction of the Intel Pentium chip, many programmers have
switched from fixed point mathematics system to floating point. This is due to
the superior floating point unit on the Pentium chip. However, this has left
a small debate within the programming world. What is the best way to perform
a floating point to integer conversion?


I will present 4 methods for float to int conversion in this document. It is
up to you to decide which is best for you.



Method 1:
---------

Typecasting. This method is a high level language method for converting a
float to an integer. Here is a small piece of code demonstrating it:

MyInt = (int)MyFloat;


Advantages:
- Completely portable and standard.
- Works with float and double without modification.
- Performs correct rounding.

Disadvantages:
- Heavily compiler dependant.
- Tends to be slow (i.e. Watcom''s slow typecast).



Method 2:
---------

Explicit FPU instruction to convert to an integer. On the x86 platform, the
instructions take the following form:

fist dword ptr [eax] ; store integer

-or-

fistp dword ptr [eax] ; store integer and pop

Using this form on x86 platforms generally avoids the overhead associated
with the compiler type casting. When compared to the typecasting under the
Watcom 10.6 compiler, the cycle count dropped from 40 to 6.


Advantages:
- Good performance
- Works with float and double without modification.

Disadvantages:
- x86 CPU dependant
- requires assembler (not really a disadvantage)
- requires 6 cycles (6 cycles for 1 instruction is quite a bit)
- Ignores rounding state of the FPU


Method 3:
---------

Magic number/fadd trick. This method uses a trick in the IEEE double format
to perform the typecasting without actual conversion.

int FLT2INT {0,0x43380000};
int FLT2FXD24_8 {0,0x42B80000};
int FLT2FXD16_16 {0,0x42380000};
int FLT2FXD8_24 {0,0x41B80000};
int TEMP {0,0};


fadd qword ptr [FLT2INT];
fstp qword ptr [TEMP];
Mov eax,[TEMP+4];

Advantages:
- Good performance

Disadvantages:
- Dependant on "double" data type, doesn''t work on "float".
- Extra constants (has to be stored as a double or two ints).
- Ignores rounding


Method 4:
---------

Integer pipeline conversion. This method takes the IEEE float format and uses
it completely to convert to an integer.


FltInt = *(int *)&MyFloat;

mantissa = (FltInt & 0x07fffff) | 0x800000;
exponent = 150 - ((FltInt >> 23) & 0xff);

if (exponent < 0)
MyInt = (mantissa << -exponent);
else
MyInt = (mantissa >> exponent);

if (FltInt & 0x80000000)
MyInt = -MyInt;


Advantages:
- Good performance
- Pure integer pipeline based (good for pairing with FPU)

Disadvantages:
- Separate routines necessary for floats and doubles.
- Costly jump to handle negatives (can hurt on PPro machines)
- Ignores rounding



Stuff
-----

The main purpose of this document is to introduce the fourth method of float
to int conversion. I had never seen anything like it and I thought it was
pretty cool. Here is how it works in a little bit more detail:


IEEE 32-bit floating point number:

31 30 23 0
________________________________
|s| exp | mantissa |
--------------------------------

What this diagram shows is the 23-bit mantissa, the 8-bit exponent, and the
1-bit sign.

The first stage of the conversion is to extract the mantissa. This is done
with simple bit masking.

mantissa = (FltInt & 0x07fffff);

With IEEE numbers, the most significant bit is always assumed to be set. This
is why the mantissa bits are all zeros for numbers which are powers of two
(like 16, 256, etc.). This bit nee



Share this post


Link to post
Share on other sites
There is the question : which sort of a floating point to integer conversion you want?

Regular C typecasting uses truncation.
Don't know about you, but usually I need round , ceil or floor .

It is very frustrating that C has quite poor support for floating point to integer conversion.
Even Java is better in this field : at least, it has function round .


> Method 1:
> ---------
> Typecasting.
...
> Advantages:
...
> - Performs correct rounding.

Hmm, it's depend... It performs correct truncation.

> Disadvantages:
> - Heavily compiler dependant.
> - Tends to be slow (i.e. Watcom's slow typecast).


It is slow (for x86) - because x86 FPU can convert floating point to integer only with current rounding mode.
Normal rounding mode is rounding to nearest .
In order to force FPU to do truncation you have to change FPU state - and this is very slow .


> Method 2:
....
> Disadvantages:
> - x86 CPU dependant
> - requires assembler (not really a disadvantage)

You just have to write different implementation of 1 function for all target platforms (and some generic code for all unknown).

> - requires 6 cycles (6 cycles for 1 instruction is quite a bit)

Anyway, it is the fastest practically possible method.

> - Ignores rounding state of the FPU

Nonsense. It does use rounding state of the FPU.
(which is usually "round to nearest")


> Method 3:
---------
Magic number/fadd trick. This method uses a trick in the IEEE double format
to perform the typecasting without actual conversion.
.....
> Advantages:
> - Good performance


It's not so simple...
This code was fast for Pentium.
But for PPro/PII/PIII it can be even slower then software conversion in pure integer code - and it is slower in my applications.
It happens because of store-to-load forwarding.
If you store double and then load just lower 32bit - you have memory access stall (load must wait for the store to write to memory before it can access required data).
Hmm, I guess it's okay for Athlon - as I remember, it can do fast forwarding if load need lower part of the data from the same address as store was.


> Disadvantages:
> - Dependant on "double" data type, doesn't work on "float".


Incorrect. Dependant on FPU internal precision.
For fastest code you may set FPU to single (float) precision.
In this case it doesn't work.

> - Ignores rounding

It uses curent rounding state.


> Method 4:
> ---------
> Integer pipeline conversion. This method takes the IEEE float > format and uses it completely to convert to an integer.
...........
> Advantages:
> - Good performance

Not so good. I tried it before. Nothing special.
Slower then Method #2.

> - Pure integer pipeline based (good for pairing with FPU)

Well, maybe - if you really want to write FPU code in pure assembler.

> Disadvantages:
......
- Costly jump to handle negatives (can hurt on PPro machines)


Hmm, my version was without "costly" (hard to predict) jumps but with correct support for overflow, infinitys and NANs:
    
inline int __stdcall trunc( float x )
{
DWORD e = (0x7F + 31) - ((*(DWORD*)&x & 0x7F800000) >> 23);
if(e < 32)
{
int s = *(int*)&x >> 31;
return int((((0x80000000 | (*(DWORD*)&x << 8)) >> e) ^ s) - s);
}
else
return (e & 0x80000000);
}


> - Ignores rounding

It performs truncation.


P.S.
If you are fine with limited range for integer numbers, you can use Method #3 with float:

// argument must be in range -0x200000..0x1FFFFF
const float FLT2INT = 0xC00000;
inline int __stdcall trunc( float x )
{
float t = x + FLT2INT;
return ((*(int*)&t)<<10)>>10;
}

This code just a little bit slower then Method #2 (for me).


Edited by - Serge K on July 19, 2000 9:39:37 AM

Share this post


Link to post
Share on other sites
Hello
It seem''s to be three times faster,but when compiler optimizations are turned on it seems to be slighly slower.
I have a AMD 450 K6-2, and also it might have something to do with the code I testing it with. So I don''t know if it is faster or not,What do all you think? and why is casting with the compiler so slow surely Microsoft could easily sort it out?

Thanks for all your help.

P.S I''m using Pro VC++ 4.0

Share this post


Link to post
Share on other sites