Sign in to follow this  
rumble

Software rasterizer and float/int color

Recommended Posts

rumble    118
For a 3D software rasterizer nowadays, is there much speed difference between using floating point or integers as color? For example, pseudo-code implementations of two color classes are below. The float one is easier to implement and use, but don't know how much slower it would be.


class Color3f
{
public:

float r,g,b;


Color3f mix( Color3f& c1 )
{
return Color3f( r*c1.r, g*c1.g, b*c1.b);
}

Color3f add( Color3f& c1 )
{
return Color3f( r+c1.r, g+c1.g, b+c1.b);
}


};

class Color3i
{
public:

int rgba;



int getRed() { return (0xFF0000 & rgba)>>16; }
int getGreen() { return (0x00FF00 & rgba)>>8; }

int getBlue() { return (0xFF & rgba)>>0; }




Color3i mix( Color3i& c1 )
{


return Color3i( ( getRed() * c1.getRed() / 256 ) <<16 |
( getGreen() * c1.getGreen() / 256 ) <<8 |
( getBlue() * c1.getBlue() / 256 ) <<0 );
}
// An add() function would have to check for overfloat/underflow. Omitted here




}

};

Share this post


Link to post
Share on other sites
Sirisian    2263
I'd just use floats. Easier to handle things like HDR I'd imagine.
[quote name='rumble' timestamp='1306203844' post='4814848']
...
[/quote]
I get that that was probably pseudo-code, but you might want to store things in arrays. Like uint8_t rgba[4]; or float rgba[4]; and plan on using SSE intrinsics a lot if you're worried about speed.

Share this post


Link to post
Share on other sites
rarelam    100
I would use a integer's unless you need the extra precision of floats. One of the largest bottlenecks in a software rasterizer is memory bandwidth and using floats instead of bytes is 4x the bandwidth(Even using 16bits per channel would be an advantage over floats). It does make the code more complicated to read, one thing you can do is use floats for your calculations and just pack to ints for storing.

Share this post


Link to post
Share on other sites
clashie    632
I've found integers the simplest since but I don't think there should be any real speed problems. The only thing I can think of is that if I used floats for color, I'd have to convert back to int before finally drawing to my backbuffer. There's some things floats might slightly simplify for me but whatever. My renderer is super simple though and I don't do anything fancy in the slightest.

I just do something like this.
[code] struct Color
{
Color() : c32(0) { };
Color(u32 c) : c32(c) { };
Color(u8 r, u8 g, u8 b) : alpha(255), red(r), green(g), blue(b) { };
Color(u8 a, u8 r, u8 g, u8 b) : alpha(a), red(r), green(g), blue(b) { };

union
{
struct
{
u8 blue;
u8 green;
u8 red;
u8 alpha;
};

u8 c8[4];
u32 c32;
}; };[/code]

My surfaces are aligned and make use of SSE. Works for me.

Share this post


Link to post
Share on other sites
rumble    118
[quote name='rarelam' timestamp='1306222851' post='4815004']
I would use a integer's unless you need the extra precision of floats. One of the largest bottlenecks in a software rasterizer is memory bandwidth and using floats instead of bytes is 4x the bandwidth(Even using 16bits per channel would be an advantage over floats). It does make the code more complicated to read, one thing you can do is use floats for your calculations and just pack to ints for storing.
[/quote]


I don't get about the memory bandwidth. Do you mean things like if texture data is in cache? That is, if texture color is int rather than float, you get more data loaded in cache?
If I limit my scene to have one .3ds file (size 400k-2000k) and twelve 512x512 images, would memory bandwidth issues be insignificant?

At what size of memory will this matter?

Share this post


Link to post
Share on other sites
Hodgman    51231
Yes, using smaller data means you can fit more in the CPUs cache at once.

Moving data from RAM into the cache is the most expensive operation you can perform ([i]10-1000 times slower than mathematical operations[/i]).

Your L1 cache is probably about 32KB, and your L2 cache around 2MB. If you're operating on more data than that and you're trying to optimise for performance, then you really want to think about how much data you're transferring between RAM and cache.

Share this post


Link to post
Share on other sites
rarelam    100
[quote name='rumble' timestamp='1306237855' post='4815073']
[quote name='rarelam' timestamp='1306222851' post='4815004']
I would use a integer's unless you need the extra precision of floats. One of the largest bottlenecks in a software rasterizer is memory bandwidth and using floats instead of bytes is 4x the bandwidth(Even using 16bits per channel would be an advantage over floats). It does make the code more complicated to read, one thing you can do is use floats for your calculations and just pack to ints for storing.
[/quote]


I don't get about the memory bandwidth. Do you mean things like if texture data is in cache? That is, if texture color is int rather than float, you get more data loaded in cache?
If I limit my scene to have one .3ds file (size 400k-2000k) and twelve 512x512 images, would memory bandwidth issues be insignificant?

At what size of memory will this matter?

[/quote]

Not just texture data but writing to the framebuffer the less reading and writing from memory the better.

The size of the textures does not have much impact if you are using mip mapping, the thing that will make a difference is how you store the data if it is bytes rather than float, you wil need to read alot less from main memory.

I would have thought it quite unlikely that you would fit all your data in cache at one time, but that is kind of irrelevant as you only need to have data you are about to access in cache. The smaller the data you are using the less data needs to be pre fetched.

Share this post


Link to post
Share on other sites
rumble    118
[quote name='Krypt0n' timestamp='1306234232' post='4815060']
storage using as small data as possible, computations using float (SIMDfied)
[/quote]
In my scene, typically I have cube maps. So I use reflection vectors to index into the cube map. If cube map stores integer RGBAs, you think the operations to convert this RGBA into a float4 is worth it? Essentially extra four multiplications by 1/255.0 each time a texel is read.


This makes me want to convert everything to 16:16 fixed point math. But similar to the int/float dilemma with color, I ask myself is it worth doing?




[quote name='Hodgman' timestamp='1306238802' post='4815079']
Yes, using smaller data means you can fit more in the CPUs cache at once.

Moving data from RAM into the cache is the most expensive operation you can perform ([i]10-1000 times slower than mathematical operations[/i]).

Your L1 cache is probably about 32KB, and your L2 cache around 2MB. If you're operating on more data than that and you're trying to optimise for performance, then you really want to think about how much data you're transferring between RAM and cache.
[/quote]

Would there be worthy speed increase if we restrict uncompressed texture/model data to be <2MB?

If the rasterizer were written in Java or some other managed language where you don't know what's going on with memory, perhaps this suggestion is even less workable?


Share this post


Link to post
Share on other sites
Krypt0n    4721
[quote name='rumble' timestamp='1306264981' post='4815261']
[quote name='Krypt0n' timestamp='1306234232' post='4815060']
storage using as small data as possible, computations using float (SIMDfied)
[/quote]
In my scene, typically I have cube maps. So I use reflection vectors to index into the cube map. If cube map stores integer RGBAs, you think the operations to convert this RGBA into a float4 is worth it? Essentially extra four multiplications by 1/255.0 each time a texel is read.
[/quote]nobody forces you to multiply by 1/255.0, I don't see the use of scaling your values by a constant.

you need float if you want to interpolate e.g. vertex-colors perspective correct, doing so with integers might be a headache. you might also want to have gamma correct blending etc. with integer you'd lose quite some performance.


In addition, loading 32bit, converting to 4 floats (SIMD) isn't really more work than working with fixed point (from performance point of view). but as soon as you do some math (interpolation, filtering, blending), you can use simple SIMD instructions, while with fixed point you'll probably do slow mul/div + shifts on ever color channel.

that's why I say, work with float (SIMDfied), keep your data as small as possible (for cache and memory bandwidth).

Share this post


Link to post
Share on other sites
rarelam    100
[quote name='Krypt0n' timestamp='1306268482' post='4815288']
[quote name='rumble' timestamp='1306264981' post='4815261']
[quote name='Krypt0n' timestamp='1306234232' post='4815060']
storage using as small data as possible, computations using float (SIMDfied)
[/quote]
In my scene, typically I have cube maps. So I use reflection vectors to index into the cube map. If cube map stores integer RGBAs, you think the operations to convert this RGBA into a float4 is worth it? Essentially extra four multiplications by 1/255.0 each time a texel is read.
[/quote]nobody forces you to multiply by 1/255.0, I don't see the use of scaling your values by a constant.

you need float if you want to interpolate e.g. vertex-colors perspective correct, doing so with integers might be a headache. you might also want to have gamma correct blending etc. with integer you'd lose quite some performance.


In addition, loading 32bit, converting to 4 floats (SIMD) isn't really more work than working with fixed point (from performance point of view). but as soon as you do some math (interpolation, filtering, blending), you can use simple SIMD instructions, while with fixed point you'll probably do slow mul/div + shifts on ever color channel.

that's why I say, work with float (SIMDfied), keep your data as small as possible (for cache and memory bandwidth).
[/quote]

Fixed point data can be handled very efficiently with SSE, linear interpolation can be done using pmaddubsw and pmaddwd (_mm_maddubs_epi16 & _mm_madd_epi16). using the 2 madd instructions you can bilinear interpolation between 4 values in 2 instructions, and you can use 8bit color data directly. Doing the same in SIMD floats the equivalent interpolation would be 9 instructions, you would still need to convert your data to floats in the first place, and you are going to use alot more registers if you start off by converting everything to floats. The result of the fixed point bilinear interpolation is in 4 32bit ints, so can easily be converted to 4 floats afterwards.

Look at the pipeline for the pixels and try and figure at what point you need float precision if at all.


Share this post


Link to post
Share on other sites
Krypt0n    4721
[quote name='rarelam' timestamp='1306310742' post='4815485']
Fixed point data can be handled very efficiently with SSE, linear interpolation can be done using pmaddubsw and pmaddwd (_mm_maddubs_epi16 & _mm_madd_epi16).

[/quote]

those both are 2d dot products, not usual madd like you could use for linear interpolation.

[quote]

using the 2 madd instructions you can bilinear interpolation between 4 values in 2 instructions, and you can use 8bit color data directly.

[/quote]

you can of course try to use dot product instructions to linearly interpolate, but two madd are not enough, as

- you need to calculate the opposite weight (1-t)

- first you'd need to interleave your register

- every dot will result in twice the bit-width (e.g. 8bit * 8bit -> 16bit) and you need to compact that into 8bit again


[quote]

Doing the same in SIMD floats the equivalent interpolation would be 9 instructions,

[/quote]

sub+mul+add are 3 instructions for doing a lerp like you want to, but that wouldn't be perspective correct.

[quote]you would still need to convert your data to floats in the first place, and you are going to use alot more registers if you start off by converting everything to floats.[/quote]

register count shall be the same, as conversion can be done in-place while you need to interleave which will demand temporal registers.

converting vertices once is nothing in comparison to the rest of the triangle rasterization



[quote]The result of the fixed point bilinear interpolation is in 4 32bit ints, so can easily be converted to 4 floats afterwards.


Look at the pipeline for the pixels and try and figure at what point you need float precision if at all.



[/quote]it's not just about precision, it's about range and simplicity, especially if you want to rasterize perspective correct

Share this post


Link to post
Share on other sites
Vilem Otte    2938
My opinion is to work with floats. Why? I don't know how good are SSE optimisations for integer numbers, but as for floats, if you'll write nice C/C++ code, the compiler will generate nice sse code (not the best!) and you'll be very happy for the speedup that you'll get.

Anyway, I'd be first concerned about actually making a rasterizer, and then decide upon color storage (it can be changed after, as well as it can be later optimised a bit more).

Share this post


Link to post
Share on other sites
rarelam    100
[quote name='Krypt0n' timestamp='1306323319' post='4815540']
[quote name='rarelam' timestamp='1306310742' post='4815485']
Fixed point data can be handled very efficiently with SSE, linear interpolation can be done using pmaddubsw and pmaddwd (_mm_maddubs_epi16 & _mm_madd_epi16).

[/quote]

those both are 2d dot products, not usual madd like you could use for linear interpolation.

[quote]

using the 2 madd instructions you can bilinear interpolation between 4 values in 2 instructions, and you can use 8bit color data directly.

[/quote]

you can of course try to use dot product instructions to linearly interpolate, but two madd are not enough, as

- you need to calculate the opposite weight (1-t)

- first you'd need to interleave your register

- every dot will result in twice the bit-width (e.g. 8bit * 8bit -> 16bit) and you need to compact that into 8bit again


[quote]

Doing the same in SIMD floats the equivalent interpolation would be 9 instructions,

[/quote]

sub+mul+add are 3 instructions for doing a lerp like you want to, but that wouldn't be perspective correct.

[quote]you would still need to convert your data to floats in the first place, and you are going to use alot more registers if you start off by converting everything to floats.[/quote]

register count shall be the same, as conversion can be done in-place while you need to interleave which will demand temporal registers.

converting vertices once is nothing in comparison to the rest of the triangle rasterization



[quote]The result of the fixed point bilinear interpolation is in 4 32bit ints, so can easily be converted to 4 floats afterwards.


Look at the pipeline for the pixels and try and figure at what point you need float precision if at all.



[/quote]it's not just about precision, it's about range and simplicity, especially if you want to rasterize perspective correct
[/quote]

It depends on what you need to interpolate, if you are bilinear filtering a texel, you do not require perspective correct on the actual color's, only the coordinates, so it makes perfect sense to use the madd instructions for instance


lerpTableX - look up table of weight and apposing weight as unsigned bytes
lerpTableY - look up table of weight and apposing weight as unsigned words

texel - contains the the four texels to interpolate between in RRRRGGGGBBBBAAAA
lerpX - contains the horizontal interpolation
lerpY - contains the vertical interpolation

__m128 bilinear( __m128i &texel, int lerpX, int lerpY ) {
__m128i output = _mm_maddubs_epi16( texel, lerpTableX[lerpX] );
output = _mm_madd_epi16( output, lerpTableY[lerpY] );
return _mm_cvtepi32_ps( output );
}
//texel is created by reading 4 memory locations and interleaving the results
//lerp simd values can be calculated as well using gpr, movd and a shuffle

This is just an alternative which for bilinear filtering is significantly faster than interpolating with floats and uses less registers.


I agree, using floats is absolutely fine as well.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this