# Convert RGBA buffer to ARGB

This topic is 1662 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello

I'm using LodePNG to load and decrypt PNG but the library return a RGBA buffer.

For my interface rendering i need a ARGB buffer so i wonder how can i convert my rgba buffer to argb ?

here is my code so far :

	std::vector<unsigned char> image; //the raw pixels
unsigned width, height;

//decode
unsigned error = lodepng::decode(image, width, height, fileName);

//if there's an error, display it
if (error)
{
WriteError("%s decoding error %u : %s", fileName, error, lodepng_error_text(error));
return FALSE;
}

char* rgbaBuffer = reinterpret_cast<char*>(&image[0]);


##### Share on other sites

Re-order the pixels components as needed.

The simplest solution would perhaps be this one, where the vector's content is re-ordered in-place by using the operator[] for accessing:

for(uint pos = 0; pos < image.size(); pos += 4) {
unsigned char r = image[pos+0];
unsigned char g = image[pos+1];
unsigned char b = image[pos+2];
unsigned char a = image[pos+3];
image[pos+0] = a;
image[pos+1] = r;
image[pos+2] = g;
image[pos+3] = b;
}


It relies on the image providing 4 components per pixel. You could use the more safe but less efficient vector::at routine instead of the vector::operator[], of course. You could do some index trickery or pointer casting trickery.

Edited by haegarr

##### Share on other sites

Oh, swapping pixels. such joy.

Slightly fancier version, with some trickery which possibly makes it up to 4x faster on some systems

unsigned int* buf = &image[0];
unsigned int* end = buf + image.size()/4;
while(buf < end) {
unsigned int pixel = *buf;
*buf = ((pixel&0xffffff00)>>8) + ((pixel&0xff)<<24);
buf++;
}


Edit: did the wrong conversion at first...

Edited by Olof Hedman

Thank you guys !

##### Share on other sites

Not looked at "LodePNG", but if it is uses libpng, libpng already provides all the transformations I have ever needed, like where to put the alpha bits (or add some fixed value if the source image had no alpha channel), so I would hope LodePNG lets you provide them.

Edited by SyncViews

##### Share on other sites

Not looked at "LodePNG", but if it is uses libpng, libpng already provides all the transformations I have ever needed, like where to put the alpha bits (or add some fixed value if the source image had no alpha channel), so I would hope LodePNG lets you provide them.

LodePNG only consists of a .hpp and a .cpp file and doesn't use libpng.

##### Share on other sites

I'd also suggest using libpng. Otherwise, Olof is definitely on the right track. You don't want to vectorize this kind of thing.

Although, for the sake of being a bit more arcane...

uint32_t* pix = buf;
pix += image.size() / 4;
while(pix >= &buf) {
*pix-- = (*pix >> 8) | (*pix << 24);
}

##### Share on other sites

Hm... and if you want it really fast, do SSE versions:

SSSE3 version (really dumb and fast):

__m128i shuffler;
void rgba_to_argb(const int* in, int* out)
{
const __m128i tmp = _mm_shuffle_epi8(pixel, shuffler);
_mm_store_si128((__m128i*)out, tmp);
}


Where:


shuffler = _mm_setr_epi8(0x01, 0x02, 0x03, 0x00,
0x05, 0x06, 0x07, 0x04,
0x09, 0x0a, 0x0b, 0x08,
0x0d, 0x0e, 0x0f, 0x0c);


Of course you can upgrade it to support batch processing and thus making it even more effective.

Adapting code for lower SSE versions means emulating _mm_shuffle_epi8 instruction (which is possible with little overhead).

##### Share on other sites

This is also embarrassingly parallel, so we could do some concurrent looping, or use the GPU. A pixel shader could do it in no time.

##### Share on other sites

This is also embarrassingly parallel, so we could do some concurrent looping, or use the GPU. A pixel shader could do it in no time.

Computationally, its a good fit, but going across the PCIe bus and back again would probably kill any performance wins unless the image is really huge. Its probably better to leave it in system memory and use SSE. Some of the newer hardware that supports compute applications in system memory would probably win, but those are few and far between now. (FYI, all but the most recent integrated graphics solutions out there today that share memory with the host need to redundantly copy the data into the GPU's carved off portion of memory, because the CPU and GPU have independent address spaces. Even now, if you have the right hardware you need the right software and drivers too).

If you've passed the data up to a pixel shader unconverted, however, its easy to just swizzle the components as the shader does its normal thing (assuming that all textures suffer this affliction, that is).

##### Share on other sites

Well doing it in a pixel shader for every pixel rendered will likely quickly have a higher total cost than just converting the thing and realistically the basic loop. Not sure I would even both with SSE for this (and don't recall libpng bothering either) since I generally have no need to convert thousands of images a second. I am also sure loading and decompressing the PNG data takes far longer.

Edited by SyncViews

##### Share on other sites

You could also switch the channels of the image in GIMP or Pohotoshop. Probably won't become faster than that.

##### Share on other sites

This is also embarrassingly parallel, so we could do some concurrent looping, or use the GPU. A pixel shader could do it in no time.

Computationally, its a good fit, but going across the PCIe bus and back again would probably kill any performance wins unless the image is really huge.

Sir, you're missing the point. We're no longer optimizing. We've entered the realm of derptimizing. The objective is to over-engineer until no one can read the code.

Really I don't think any kind of optimization is necessary here unless the images actually are incredibly large. I think concurrent looping might also create more overhead than it would provide help. This would be a once per resource load thing, so even if it takes a full tenth of a second, you only have to do it very occasionally. I think Olof's method is probably best, and let the compiler give it a little speed, though libpng can just unpack it properly in the first place.