The fastest way from R5G6B5 to unsigned short value?

Started by
18 comments, last by GameDev.net 19 years, 8 months ago
Are you sure there's no tricks to make it so you're not doing this 10,000 times? Or perhaps move the execution to a different spot in the program so the user effect is less?

If my guess about what you're doing is correct, there's likely quite a few work arounds to "solve" your problem.
Advertisement
Quote:Original post by AssDruid
Yes I'm sure.


Have you profiled your code to make sure that this is where your performance is being killed?

Also, you could consider sacrificing 131,072 bytes of RAM to store an array of all possible values for a 16-bit color value:

unsigned short g_color[32][64][32];FillColors(){	  for(int r=0; r<32; r++)		  for(int g=0; g<64; g++)			  for(int b=0; b<32; b++)				  g_color[r][g] = RGB_COLOR_VALUE;	// calculate like you already are}// this function wouldn't really be needed (unless you want to put all this in a class)unsigned short rgb(short r,short g,short b){	return g_color[r][g];}


- Mike
Also, if speed is REALLY important, you should not declare any variables inside your function.

A slightly faster way to code your posted source would be:

unsigned short rgb(short r,short g,short b){	return (unsigned short) (r & 0xff) << 11 | (g & 0xff) << 6 | (b & 0xff);}


Of course, your compiler may optimize this kind of stuff for you - I'm not really sure.

If you must declare variables in your function, you could try making them static (in C) or you could declare them as globals or class member variables.
declaring locals does not slow things down. The compiler is going to be creating a temporary anyways to hold the return value, which will live in a register.

If you declare a local, set it, and return that, you should get the same code out.

Making it static is worse, as that guarantees that it's going in memory.

If you guarantee that all values passed in are in-range, and aren't trying to convert from RGB888 to RGB565,
return r << 11 | g << 5 | b;
is going to be as fast or faster than the array lookup. If it's a static 3-dimensional array, you do about as much math to compute the array index as you do to compute the color. On top of that, a 128K array will hose your cache, and run slower than just computing it.

If the values aren't guaranteed to be in range, or you have to mask, it will take slightly longer depending on the compiler. If the inputs are all 0-255, the proper code would be:
return ((r << 8) & 0xF800) | ((g << 3) & 0x7E0) | ((b >> 3) & 0x1F);


Also, put this in a header file, and make it inline. The function call overhead for a 10 instruction or less function is a bit painful.
I would write it like this:
inline unsigned short rgb(short r,short g,short b) {  return (unsigned short) (((r<<6)|g)<<5)|b;}That might make for better register usage.You should be making sure that your parameters are clipped to 0x1F, 0x3F, and 0x1F before calling this function. as you will not always need to do it here.The other option is to make this as a #define as that can beat inlining for speed, which might be the way to go here.#define rgb(r,g,b) ((unsigned short)((((r)<<6)|(g))<<5)|(b))
I would write it like this:
inline unsigned short rgb(short r,short g,short b) {  return (unsigned short) (((r<<6)|g)<<5)|b;}
That might make for better register usage.

You should be making sure that your parameters are clipped to 0x1F, 0x3F, and 0x1F before calling this function. as you will not always need to do it here.

The other option is to make this as a #define as that can beat inlining for speed, which might be the way to go here.
#define rgb(r,g,b) ((unsigned short)((((r)<<6)|(g))<<5)|(b))

-----
iMalc
Quote:Original post by AssDruid
this is for a Pocket PC, Windows CE, not for PC.


Oh man, why didn't you say so? Color conversion is a cycle killer. Hand-code this puppy in your native assembler. I work on video decoders on embedded systems, and color conversion is a significant portion of our execution time.
Don't PocketPCs run a variety of processors (x86, SH, Mips, ARM, and PowerPC - or is that just what WinCE supports)? Which processor does your machine use? Maybe it has an integer SIMD instruction set.
Quote:Original post by Stoffel
Hand-code this puppy in your native assembler.


Oh come on. Do you *really* think that there will be significant optimizations at the assembly level for something this simple?

Also I wish someone had pointed out that most compilers will inline something like this all by themselves.


Edit:
Preprocess your data if it's too slow. Better yet, preprocess it anyway.
probably some mmx stuff you could do to speed this up, but i dont know it. I'd suggest just using an existing library, like SDL, and relying on its implementation to be best. That way, if it isnt, and you have a faster way, you can fix it and then everyone benefits.

This topic is closed to new replies.

Advertisement