Sign in to follow this  
AssDruid

The fastest way from R5G6B5 to unsigned short value?

Recommended Posts

I'm currently using this:
unsigned short rgb(short r,short g,short b)
{
	unsigned short PixelCol;
	PixelCol = (unsigned short) (r & 0xff) << 11 | (g & 0xff) << 6 | (b & 0xff);
    
	return PixelCol;
}

But speed is REALLY important, and if I call this function something about 10,000 times so it's really slowing down...

Share this post


Link to post
Share on other sites
Hi,

Are you sure that it's this function that is slowing you down? That's about as fast as you can get, not to mention that the compiler probably optimizes it further using it's knowledge of the target architecture.

Vovan

Share this post


Link to post
Share on other sites
Quote:
Original post by vovansim
Hi,

Are you sure that it's this function that is slowing you down? That's about as fast as you can get, not to mention that the compiler probably optimizes it further using it's knowledge of the target architecture.

Vovan

Yes I'm sure.
If this the fastest way to do it, so there is nothing to do...

Share this post


Link to post
Share on other sites
also, shouldn't the masks be 0x1F 0x3F 0x1F instead of 0xFF all around? you could try bumping down the inputs to char sizes... that would make all arguments fit in a single word...

The only thing faster you'll find would be:
1) Removing the & ___ masks, since anything out of range won't display the expected output anyways (since at least 1 value is in error)

2) Assmebly instructions specifically targeted for multimedia.

Share this post


Link to post
Share on other sites
For tiny functions like this, I would (in a header file, else inline expansion wont work):


inline unsigned short rgb(short r,short g,short b)
{
return (unsigned short) (r & 0xff) << 11 | (g & 0xff) << 6 | (b & 0xff);;
}



Alan

Share this post


Link to post
Share on other sites
Assuming you are on a PC, then it is not a good idea to use shorts. Use ints, or chars - they will be faster. The masking is probably not necesary if there is only 8 bits per component.

But more importantly, I dont think that will work at all. You are trying to squeeze 24 bits in to 16 without discarding the extra bits. Here's what I use..


Packed= (int)( ( (uint32)Col.R ) >> 3UL ) |
(int)( ( (uint32)Col.G & 0xfc ) << 3UL ) |
(int)( ( (uint32)Col.B & 0xf8 ) << 8UL );


EDIT: I think I misunderstood the requirements ;) I thought you were trying to pack 3 8-bit colour components in to 16 bit.

Share this post


Link to post
Share on other sites
Your masks are wrong and you shouldn't have to do bit depth conversion in any time critical situation - you should do it while loading. Redesigning your system is probably the best optimization.

And if this is being done during loading and loading is too slow, make smaller graphics. You should have a system in place to warn you when stuff takes too long to load and loading being too slow isn't only because of code - the size of the data files is important too.

Have you profiled your code?

Share this post


Link to post
Share on other sites
Are you sure there's no tricks to make it so you're not doing this 10,000 times? Or perhaps move the execution to a different spot in the program so the user effect is less?

If my guess about what you're doing is correct, there's likely quite a few work arounds to "solve" your problem.

Share this post


Link to post
Share on other sites
Quote:
Original post by AssDruid
Yes I'm sure.


Have you profiled your code to make sure that this is where your performance is being killed?

Also, you could consider sacrificing 131,072 bytes of RAM to store an array of all possible values for a 16-bit color value:


unsigned short g_color[32][64][32];

FillColors()
{
for(int r=0; r<32; r++)
for(int g=0; g<64; g++)
for(int b=0; b<32; b++)
g_color[r][g][b] = RGB_COLOR_VALUE; // calculate like you already are
}

// this function wouldn't really be needed (unless you want to put all this in a class)
unsigned short rgb(short r,short g,short b)
{
return g_color[r][g][b];
}




- Mike

Share this post


Link to post
Share on other sites
Also, if speed is REALLY important, you should not declare any variables inside your function.

A slightly faster way to code your posted source would be:


unsigned short rgb(short r,short g,short b)
{
return (unsigned short) (r & 0xff) << 11 | (g & 0xff) << 6 | (b & 0xff);
}



Of course, your compiler may optimize this kind of stuff for you - I'm not really sure.

If you must declare variables in your function, you could try making them static (in C) or you could declare them as globals or class member variables.

Share this post


Link to post
Share on other sites
declaring locals does not slow things down. The compiler is going to be creating a temporary anyways to hold the return value, which will live in a register.

If you declare a local, set it, and return that, you should get the same code out.

Making it static is worse, as that guarantees that it's going in memory.

If you guarantee that all values passed in are in-range, and aren't trying to convert from RGB888 to RGB565,
return r << 11 | g << 5 | b;
is going to be as fast or faster than the array lookup. If it's a static 3-dimensional array, you do about as much math to compute the array index as you do to compute the color. On top of that, a 128K array will hose your cache, and run slower than just computing it.

If the values aren't guaranteed to be in range, or you have to mask, it will take slightly longer depending on the compiler. If the inputs are all 0-255, the proper code would be:
return ((r << 8) & 0xF800) | ((g << 3) & 0x7E0) | ((b >> 3) & 0x1F);


Also, put this in a header file, and make it inline. The function call overhead for a 10 instruction or less function is a bit painful.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I would write it like this:
inline unsigned short rgb(short r,short g,short b) {
return (unsigned short) (((r<<6)|g)<<5)|b;
}That might make for better register usage.

You should be making sure that your parameters are clipped to 0x1F, 0x3F, and 0x1F before calling this function. as you will not always need to do it here.

The other option is to make this as a #define as that can beat inlining for speed, which might be the way to go here.#define rgb(r,g,b) ((unsigned short)((((r)<<6)|(g))<<5)|(b))

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I would write it like this:
inline unsigned short rgb(short r,short g,short b) {
return (unsigned short) (((r<<6)|g)<<5)|b;
}
That might make for better register usage.

You should be making sure that your parameters are clipped to 0x1F, 0x3F, and 0x1F before calling this function. as you will not always need to do it here.

The other option is to make this as a #define as that can beat inlining for speed, which might be the way to go here.
#define rgb(r,g,b) ((unsigned short)((((r)<<6)|(g))<<5)|(b))

-----
iMalc

Share this post


Link to post
Share on other sites
Quote:
Original post by AssDruid
this is for a Pocket PC, Windows CE, not for PC.


Oh man, why didn't you say so? Color conversion is a cycle killer. Hand-code this puppy in your native assembler. I work on video decoders on embedded systems, and color conversion is a significant portion of our execution time.

Share this post


Link to post
Share on other sites
Don't PocketPCs run a variety of processors (x86, SH, Mips, ARM, and PowerPC - or is that just what WinCE supports)? Which processor does your machine use? Maybe it has an integer SIMD instruction set.

Share this post


Link to post
Share on other sites
Quote:
Original post by Stoffel
Hand-code this puppy in your native assembler.


Oh come on. Do you *really* think that there will be significant optimizations at the assembly level for something this simple?

Also I wish someone had pointed out that most compilers will inline something like this all by themselves.


Edit:
Preprocess your data if it's too slow. Better yet, preprocess it anyway.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
probably some mmx stuff you could do to speed this up, but i dont know it. I'd suggest just using an existing library, like SDL, and relying on its implementation to be best. That way, if it isnt, and you have a faster way, you can fix it and then everyone benefits.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this