Floating point in 2 bytes...

Graphics and GPU Programming Programming

Started by Goodlife March 21, 2000 10:08 AM

8 comments, last by Goodlife 24 years ago

122

Author

March 21, 2000 10:08 AM

Hi all... First let me just congratulate the designers of all the 3D APIs for making texture coordinates be floating point numbers between zero and one. This has helped me to gain a better understanding of floating point, caused me to code REVOLUTIONARY algorithms to convert a ''normal'' coordinate system to an idiotic one (giving me insight into artifical intelligence as well), helped me drop a few extra cycles that were causing my program to run too fast to play (eliminating the need for one of those awful ''slow down the program'' timing loops), and, most importantly, helped me to ''bulk up'' the disk space of my program, so I can express the crucial idea "it''s a big download, so it MUST be good." Now, on to my question: I really, really, really want to shrink the size of my graphics. When I had my graphics under my old, self-coded system, each vertex was only 6 bytes long: 4 bytes for xyz (10 bits each), and two chars for texture coordinates. A 200-vertex figure was 1200 bytes. Under the exciting new api, it is 12 bytes per vertex: 4 for xyz, and EIGHT!!!! EIGHT!!!! EIGHT!!! for texture coordinates. A 200 VERTEX FIGURE IS 2400 BYTES! FOR THE SIZE OF THE FIGURES I''M USING, THIS IS LARGER THAN IF I USE A DIRECT BITMAP, UNCOMPACTED, WITH!!! ALL!!! THE!!! HEADER!!! INFORMATION!!!! I want to reduce that floating point coordinate to at least two. I have played around with copying only half the bytes, but the format (is it IEEE still?) apparently requires the whole four bytes to store the number. I don''t want to do any division, because we all know how that kills performance. So... I was wondering if anyone knew how to compact a floating point down (doesn''t matter how fast) into a format that uncompacts in under 1-2 cycles? I would like to store these floating point numbers (which never have to be very large) in only two bytes, and yet have them readily available to pass to the (expletive deleted) (expletive deleted) mother(expletive deleted) (expletive deleted) D3D API. And, incidentally, if anyone tells me that OPENGL already handles this, quickly and easily, or, worse, allows you to directly reference texture coordinates by their xy coodrinates, I will personally have you chased to the ends of the earth by the finest demons that thaumaturgy can conjure up. Thank you for your time.

-- Goodlife-----------------------------Those whom the gods would destroy, they first drive mad.--DirectX design team official motto

daveb

122

March 21, 2000 04:47 PM

I would suggest moving away from what you're doing and move towards the 4 byte floating point arrangement. Using a horribly compacted and misaligned vertex structure like you have is silly. You're going to lose performance simply because of access times. Not to mention that for practically every modern processor out there, floating point math is just as fast as integer math (the Pentium actually converts integers to floats before operating on them).

If you really _must_ convert to 2 bytes, you should investigate using a 2 byte short as a 12.4 fixed point representation. That is, 12 fixed bits of left-of-the-decimal precision and 4 fixed bits of right-of-the-decimal precision.

Memory usage shouldn't really be your primary concern for geometry. A single 256x256 16 bit texture is 130k. The space for 200 vertices is insignificant compared to textures. Memory alignment and processor performance issues should be what you focus on. You almost never want to go below native processor word sizes.

One last thing - there's no reason to do any of this or to code "revolutionary algorithms" to do any of this if you're just interested in a quick fix. If you're using 8 bit texture coords, to convert to 0.0-1.0, just use

char u_in = some_value;
char v_in = some_value;

float u_out = (float)u_in / 255.0f;
float v_out = (float)v_in / 255.0f;

Trivial. But again, I'd recommend against this kind of solution.

Edited by - daveb on 3/21/00 4:51:33 PM

Volition, Inc.

CGameProgrammer

640

March 22, 2000 12:52 AM

Um... I don''t know too much about the Pentium architecture, but I wrote a DirectX raycaster, and it ran much faster using fixed-point math than floating-point. And I ran it on a Pentium II MMX. Maybe float-float math is as fast as integer-integer math, but float-integer conversions are apparently very slow, just as they''ve always been.

~CGameProgrammer( );

~CGameProgrammer( );Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed.

gmcbay

130

March 22, 2000 01:21 AM

float-integer conversions being slow is a good argument for NOT using fixed point when using a graphics API like OpenGL or Direct3D which generally expect you to specify things in floating point anyway.

daveb is right here, this supposed optimization is nothing but added complexity and if anything it makes things worse in the long run.

Please leave fixed point math for (PC) gaming back in 1994 where it belongs.

Edited by - gmcbay on 3/22/00 1:21:39 AM

Anonymous

March 22, 2000 02:00 AM

I agree with you. Anyhow use mul instead of div.
Multiply is faster. Or it was faster on Pentiums.
I suppose this is still faster.
Never heard about integer/float conversion.
Probably this conversion can be done at compiler level, but it seems false to me that Pentium converts integer to float.
I mean: if you''re writing asm code, how do the cpu knows what kind of data are you using? You could have signed/unsigned numbers (of course you specify it when using mul or div (imul - idiv), but what about add-sub?), you could work on fixed point integers...

As for scene size, I think you''re right: none should worry about it. Good engines have much more data for describing a scene than a simple vertex structure. Think about bsp trees and pvses.

In general memory alignment can improve performances, but I think if you''re using vertex buffers, a bsp and a pvs, memory alignment will not dramatically hit performance, as the number of first-accesses is really low. This works, of course, unless you''re using a silly data size like fixed point integers.

Now I have a question, which I do not suppose to be OT
(since you said it is better to work on performance optimizations)... if it is, excuse me.
I had a look at intel website, but I found no infos about
instruction timing on p2/p3. This is probably due to the low meaning of intructions timing on a superscalar architecture, and of course nowadays there''s generally no need for an higly optimized code. Intel reserved namespace for 6 pipes (U-V were used and you still have letters for w-x-y-z), which at the time of Pentiums were considered the maximum number of pipes needed, since if you have more than 6 pipes probably they will never be able to work toghether, as you''ll have to reuse te same register ( making the cpu stall ). I do not know if intel is still using only u and v pipes.
Anyhow I was asking myself, and now I ask you, if there''s some document about p2/p3 instruction timing. Altough a simple instruction timing doc could be useless (for the reasons I told before), consider there are several sqrt approximations, and the last document available for Pentiums stated fsqrt is executed in approx. 80 cycles. Think about how much do u use sqrt and figure out why an instruction timing document could be useful for 3d programmers.

Thank you.

quote:Original post by daveb
just interested in a quick fix. If you''re using 8 bit texture coords, to convert to 0.0-1.0, just use

char u_in = some_value;
char v_in = some_value;

float u_out = (float)u_in / 255.0f;
float v_out = (float)v_in / 255.0f;

Trivial. But again, I''d recommend against this kind of solution.

Lars W.

122

March 22, 2000 03:16 AM

Hi

I think i just answer your Question, and let you determine, if it is usefull.
This can only work for Texture Coordinates, maybe it works for others too but it could get complicated.
So if you assume, that your floatingpoint Numbers lie between 0.0 and 1.0 with a minimum (relevant) Step Size of 1/2048, as this is the currently maximum Texture Size. So the smallest number in Decimal would be something around 0,0004. so you don''t need the whole Exponent of the Floating point. You just need 5 bits ( 2^5 = 16 > 2^11 = 2048 )because you maximaly move your Mantissa this far. You also know that it is always negative, so take the 2-Komplement of the exponent and store the positive value (don''t forget this basenumber which is added/subtracted from the Exponent in the IEEE Standard). Now You can make the Mantisse a bit smaller. For that, You have to round the decimal Numbers, of your floating Point, to the least important, because of the 0,0004 as minimum you only need 4 exact numbers to represent your floatingpoint. So you need 15 Bits, ok you could reduce this to 11, to get your 2 Bytes, but would be not very accurate. But maybe you are always using textures smaller then 2048x2048 the in would get smaller. The "loosy compression" of those numbers can take while ( some cycles ) but decompression should go faster, just shifting the mantisse, and then taking the 2-Komplement of the exponat, and add the Base.

I hope you have understood everything, my English is a bit bad :-)

Lars

--------> http://www.larswolter.de <---------

daveb

122

March 22, 2000 09:37 AM

2048x2048 textures? You''re a madman!

I still like to wage my personal war against > 256x256 textures. Everyone thinks they''re cool, but they''re not :p

Volition, Inc.

Lars W.

122

March 22, 2000 10:09 AM

Hi

Maybe not at the moment, but with Texture Compression nearly as standard, it is no problem, and i am using 2048x2048 Textures in my Game, and with MipMapping the slowdown isn''t that tough (OK, only one big texture) and i use an old fashioned TNT 1 !

Lars

--------> http://www.larswolter.de <---------

Goodlife

122

Author

March 23, 2000 09:35 AM

Hi all--
Thanks for your replies, but I think the point of my question was missed. I do not wish to do any conversions along the lines of shifting bits, or multiplying, or dividing.

I store my floating point in four bytes. What I would do,
ideally is this:

float realnumber;
short holdit;

holdit=*(short*)realnumber;
*(short*)realnumber=holdit;

I was HOPING that somebody knew more about floating point than me, and could tell me if I could organize things so that the HIWORD of the floating point would not be needed. Whenever I play with it, I see significant bits strung all through it:

.1 = 01000000 01101000 100000000 00000010

I was hoping someone would tell me if there''s a range of
numbers, or whatnot, that would yield a result like:

.? = 00000000 00000000 000100000 00110001

With the HIWORD being blank and discardable.
As for speed, I''m not too worried about it. With my compacted form (and this, especially for the fellow who told me that it would slow everything down) I get a speed boost just because the whole graphic fits in the cache.

I am actually considering using a 128-member array (none of my textures are larger than 128x128) and sending my two numbers across, and referencing them in the array. The problem is, when I use a smaller texture, what was .0075 for a 128x128 texture is a little off, so I was just wondering if anyone knew any secret tricks.

I know fixed point rather well. I had always used it, and would still be using it if the (expletive deleted) APIs didn''t insist that I slow-format floating numbers.

-- Goodlife-----------------------------Those whom the gods would destroy, they first drive mad.--DirectX design team official motto

SiCrane

11,840

March 23, 2000 10:05 AM

Except for a discrete range of subnormal or denormalized numbers, the two most significant bytes are always non-discardable.

In the .1 example:

.1 = 01000000 01101000 100000000 00000010

The first 24 bytes are the binary digits following the first 1, and the last 8 bytes are the exponent information.

In order to take just two bytes of information, without bit shifts, you''d need to take the first byte and the last byte. This can still be done by funky casts. However you''d still lose 16 binary digits of precision, and it''d still be faster using bitshifts (actually a single rotate rather than a shift would be fastest).

And if you want to try to use the de-norm range, you''d have to do floating point multiplies to get your numbers into the de-norm range and you''d still lose 16 bits of precision.

Floating point in 2 bytes...

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Floating point in 2 bytes...

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines