Byte manipulation question. I *should* know this...

Started by
10 comments, last by MarkS 14 years, 1 month ago
So, I'm trying to optimize texture access. Currently, I'm doing four memory accesses per texel to get the red, green, blue and alpha components. It then occurred to me that I'm fetching four bytes out of what is basically an unsigned long and then recombining them. Seems wasteful. I still need to extract the components for alpha blending and such, but I thought I could do it faster. Anyway, the current (working) code looks like this:

// "texture" is of type unsigned char...
offset = ((tv * tex_width) + tu) * 4;
tr = texture[offset];
tg = texture[offset + 1];
tb = texture[offset + 2];
ta = texture[offset + 3];
.
.
.
//"buffer" is of type unsigned long...
*buffer = (ta << 24) | (tr << 16) | (tg << 8) | tb;




What I thought would work and be faster was this:

// "texture" is of type unsigned long...
offset = (tv * tex_width) + tu;
t_val = texture[offset];
tr = (unsigned char)(t_val & 0x000000FF);
tg = (unsigned char)(t_val & 0x0000FF00);
tb = (unsigned char)(t_val & 0x00FF0000);
ta = (unsigned char)(t_val & 0xFF000000);
.
.
.
//"buffer" is of type unsigned long.
*buffer = (ta << 24) | (tr << 16) | (tg << 8) | tb;




Oddly, this doesn't work. I've played around with the byte masks, but the texture is always rendered in one of the component colors and never a combination. I posted this here, instead of Graphics Programming and Theory, because this is more about byte manipulation that graphics. The byte masks may be accessing the wrong components, but that wouldn't cause the texture to be rendered in one component color. This makes me think that I'm not accessing the data like I want. Byte and bit manipulation has always been a weakness for me (not sure why...). Any ideas? [Edited by - maspeir on March 8, 2010 11:34:15 PM]

No, I am not a professional programmer. I'm just a hobbyist having fun...

Advertisement
Quote:Original post by maspeir
ta = (unsigned char)(t & 0xFF000000);

You're tossing out all but the top eight bits with that bitwise and... then you're tossing out all but the bottom eight bits with the cast. That equals zero. You need to shift down before you cast.
You need to shift the values before assigning.

Instead of
tr = (unsigned char)(t & 0x000000FF);tg = (unsigned char)(t & 0x0000FF00);tb = (unsigned char)(t & 0x00FF0000);ta = (unsigned char)(t & 0xFF000000);

you need to do
tr = (unsigned char)(t & 0x000000FF);tg = (unsigned char)((t & 0x0000FF00)>>8);tb = (unsigned char)((t & 0x00FF0000)>>16);ta = (unsigned char)((t & 0xFF000000)>>24);
Shoot Pixels Not People
Thanks! I knew I had to be doing something wrong.

Huh. I works, but I'm seeing a 50% or more DROP in frame rate. Odd... I figured array access would be slower.

No, I am not a professional programmer. I'm just a hobbyist having fun...

Quote:Original post by maspeir
Thanks! I knew I had to be doing something wrong.

Huh. I works, but I'm seeing a 50% or more DROP in frame rate. Odd... I figured array access would be slower.


Array access is only slow when it forces a cache miss. For what you're doing, you're getting one cache miss and then everything else is on the same cache line. So loads on green, blue, and alpha should be extremely fast.
Quote:Original post by Drakonite
You need to shift the values before assigning.

Instead of
tr = (unsigned char)(t & 0x000000FF);tg = (unsigned char)(t & 0x0000FF00);tb = (unsigned char)(t & 0x00FF0000);ta = (unsigned char)(t & 0xFF000000);

you need to do
tr = (unsigned char)(t & 0x000000FF);tg = (unsigned char)((t & 0x0000FF00)>>8);tb = (unsigned char)((t & 0x00FF0000)>>16);ta = (unsigned char)((t & 0xFF000000)>>24);
Except that you're better off doing the shifts first to reduce the size of the masking constants.
tr = (unsigned char)((t) & 0xFF);tg = (unsigned char)((t>>8) & 0xFF);tb = (unsigned char)((t>>16) & 0xFF);ta = (unsigned char)((t>>24) & 0xFF);
And in this case you'll then notice that anding with the masks is redundant anyway thanks to the casts. Thus:
tr = (unsigned char)t;tg = (unsigned char)(t>>8);tb = (unsigned char)(t>>16);ta = (unsigned char)(t>>24);

The next thing then is to discover that several operations such as alpha blending can be done without fully separating the 32-bit colours down into their individual channels. I.e. you can blend the red and the green at the same time etc - the double blend trick.
Or it you use MMX/SSE etc then it gets much quicker still.

I've got a ton of optimised pixel manipulation stuff like this on my website in the Useful Classes section.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Thanks for the info, iMalk. I do have a question about one function on the stereopsis page. What are the expected values of "xp" and "yp" in the Bilerp32 function? Texel or pixel coordinates or something else?

No, I am not a professional programmer. I'm just a hobbyist having fun...

Quote:Original post by maspeir
Thanks for the info, iMalk. I do have a question about one function on the stereopsis page. What are the expected values of "xp" and "yp" in the Bilerp32 function? Texel or pixel coordinates or something else?
Those are the off-pixel-centre proportion amounts in the x and y directions. They're both 0 to 255, and control the weighting of the blending of the four texel values. 0 for xp means weigh 100% towards 'a' and 'c', and 255 means weight as much as possible towards 'b' and 'd'. xp then determines how to weight between those two.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Ah. Thanks. And thanks for the links.

No, I am not a professional programmer. I'm just a hobbyist having fun...

Why not simply do this?

offset = ((tv * tex_width) + tu) * 4;
memcpy(buffer, &texture[offset], 4);

or even

UINT *pData = (UINT*)&texture[offset];
*buffer = *pData;

This topic is closed to new replies.

Advertisement