Archived

This topic is now archived and is closed to further replies.

cyberben

Floating Point vs Integer

Recommended Posts

cyberben    122
Hi there I was wondering a few things.. I was looking at BeanDogs SetSpeedFacter procedure which basically sets a float (double) at some decimal value according to wheather the fps is higher or lower than his standard.. So every movement is multiplied by this float. There by if a unit move 30 pixels per second @ 60fps, at 30 fps it would still move 30pixels per second because it''s movement each frame is multiplied by this SpeedFacter float. Now I was thinking about speed and stuff and here''s what I thought... Assuming I will have 8 teams in a network game and each team could have up to 300 units or more I''d be multiplying a whole lot of floating point math. So I thought a few things... first I could make it a fraction.. so I''m getting x fps under my desired 60 or 75 or something... so you''d multiply (Integer math) your movement by 60 or 75 then divide by however many fps your getting... but nowadays fp math has a lot of speed and a normal mul in assembly will cost about 16 cpu cycles and a normal div will cost 16 to 30 cpu cycles and you have to clear edx in between those two operation (Edx is the remainder in a division and part of the source in a multiplication) which will cost you another cpu cycle... And I can''t remember but I *though* fmul was somewher around 40 cpu cycles? Also I don''t need 15 decimal precision so a single float would be fine and is 32 bits therefor a lot faster.. So in short is moving to an integer solution with two operations really worth it? And can anyone point me two the reference for cpu cycle per instruction with floating point math? I looked on the intel site for over a 1/2 an hour and figured someone here must know of by hand, so I won''t waste my time... Thanks! - Ben

Share this post


Link to post
Share on other sites
blue-lightning    122
What you are looking for is fixed point. I personally love fixed point, but many people have been telling me its slower for p3''s. It really depends on how well you know assembly. Sometimes if you use fixed point other optimizations become available, but you have to be really good at asm to find them.

By the way fixed point works by using a muliply followed by a shift.

One advantage with fixed point is you can sometimes use mmx with it, which lets you do more than one operation at a time. Of course the floating point equivalent is 3dnow and SSE (I bet none of you ever even looked at the 3dnow opcodes).

Share this post


Link to post
Share on other sites
cyberben    122
So your saying do fixed point on a sub-PIII machine and floating point on a PIII? Right I think that makes sense... I heard somewhere that the difference between th ehigher end celerons and a PIII is that a PIII has the math co-processor on chip so it''s much faster and a celeron has the math co-processor off-chip so it is slowed down by the bus...

I''m using CPUID to identify the presence of MMX and FPU right now, and I''m pretty sure I can also retrieve the "class" of CPU like Pentium, PentiumPro, PentiumMMX,PentiumII,PentiumIII... but I''m not sure what other crappy clones will return like AMD and Cyrix machines...

I thought about SIMD but I don''t want to restrict my game to a PIII, I''m using MMX and I though about AMD''s 3Dnow... but IMHO AMD is striving desperitly to achive what intel has and 3Dnow is there attempt to have something that intel doesn''t and I don''t want to help spawn a CPU market with many different standards... I''d like to support the standards but I''m not going to spend tons of time to create support for AMD''s technology, it''s your choice if you want to buy a crappy clone or industry standard machine, and your performane will reflect it.

I think I''ll stick to MMX w/ Fixed point and Floating Point Math.

Yah I understand how shifting works... the point of shifting is to achive a divide by 2,4,8,16,32,64...etc... in 1 cpu cycle vs a mul which require something like 16 cycles..

Any more opinions or help?
Thanks,
Ben

P.S. Anyone no where I can look up the cpu cycles for the fpu instructions on a PII and PIII?? Thanks - Ben

Share this post


Link to post
Share on other sites
cyberben    122
Hey I had another idea... instead of mul and shift why not shift and divide? Like if I call my target fps 64, then divide by my actual frame per second? So:

shl eax, 6 ; Shift 6 place to the left (Mul by 64)
xor edx,edx ; Clear edx as div; divides edx:eax by src
mov ebx, fps ; Use fps cariable as other operand
div ebx ; Divide by frame per second

Wouldn''t that be about the same? That looks right to me..
- Ben

Share this post


Link to post
Share on other sites
blue-lightning    122
I don''t know about shift and divide. Usually fixed point is multiply and shift. Mostly because multiply is faster than divide.

AMD isn''t that bad now. Their thunderbird is just as fast as Intel''s. Also Intel made SSE to compete with 3dnow, but its not nearly as supported.

The reason floating point is faster on P3s and athlons is because they are more pipelined.

Share this post


Link to post
Share on other sites
cyberben    122
Maybe I should just go with Floating Point and those people running PII''s will just have to bear the slow down... I mean my game isn''t going to be released for another year or two so what do you think the standard will be?
Anyhow thanks for the insight!
See ya,
Ben

Share this post


Link to post
Share on other sites
blue-lightning    122
I just really like fixed point. The way I see it is that if the person has a p3 they don''t need more optimizations. P3''s still do fixed point fairly well, they just do floating point faster.

The big thing is that fixed point is harder to do fast than floating point.

Share this post


Link to post
Share on other sites
cyberben    122
Well then can you suggest some good places to learn about fixed point? I think I could figure it out after a while but it''d be nice to have a basic tutorial of some sort... I think I ''ll end up using some floating point eventually anyhow. We''ll see.
- Ben

Share this post


Link to post
Share on other sites
Serge K    154
quote:
Original post by cyberben

And I can't remember but I *though* fmul was somewher around 40 cpu cycles? Also I don't need 15 decimal precision so a single float would be fine and is 32 bits therefor a lot faster..



Actually, fmul is somewhere around 3-5 cycles (same for fadd).

Little summary:

        
fmul fadd fdiv (i)mul (i)div
PPro/PII : 5/2 3/1 17f/36d 4
PIII : 5/2 3/1 18f/32d 4
Athlon : 4/1 4/1 16f/20d 4-5 ~40
K6(-2) : 2/2 2/2 2
Pentium : 3/1 3/1 11 ~40

fmul and fadd : 5/2 - result after 5 cycles / may start new operation each second cycle
fdiv - delay for float/double


Edited by - Serge K on July 21, 2000 7:46:16 AM

Share this post


Link to post
Share on other sites