Back to General and Gameplay Programming

Efficiently obtaining Red Channel from BGRA Bitmap

gpu_noob · 2013-01-15T21:45:04

byte* bgra = byte array of a BGRA formatted bitmap image; byte* r = new byte[Height*Width]; for (int i = 0; i < Height; i++) { for (int j = 0; j < Width; j++) { int offset = i*Width + j; r[offset] = bgra[offset*4 + 2]; } } delete[] r; I'm using the above code to obtain red channel values from a byte array of a BGRA bitmap image. The image is formatted as: B G R A B G R A... (Size of W*H*4) I want to obtain a byte array of R R R R... (Size of W*H) Is there a more efficient way of doing this without using for loops?

General and Gameplay Programming Programming

Started by gpu_noob January 13, 2013 10:00 AM

21 comments, last by Adam_42 11 years, 3 months ago

alvaro

21,604

January 15, 2013 06:26 PM

I'm pretty surprised too that you're measuring 5ms for copying that much data. Are you on some old hardware?

I got similar timings on my laptop, which is about one year old.

Ravyne

14,306

January 15, 2013 07:31 PM

So, if you're not already, what you probably want to do in this case is modify your loop to compute all 4 R, G, B, and A arrays (I'll call these planes) -- I presume you'll need the Green and Blue channels at some point too, for YUV you may or may not need A (which I assume remains alpha).

It seems likely to me that the real bottleneck here is the copy from GPU to system memory -- by doing all 4 planes per loop iteration, you'll make efficient use of cache, and since the source array is already transfered, you aren't paying that penalty again. Whereas the red channel alone has a cost of around 6ms, I'd wager you can easily get the whole set for under 10.

Something like:


int size        = Height * Width;

bgra* src	= byte array of a BGRA formatted bitmap image;
bgra* end       = src + size;

byte* r		= new byte[size];
byte* g		= new byte[size];
byte* b		= new byte[size];
byte* a		= new byte[size];

byte* r_dst     = r;
byte* g_dst     = g;
byte* b_dst     = b;
byte* a_dst     = a;

while (src < end)
{
  r_dst++ = RED(src);
  g_dst++ = GRN(src);
  b_dst++ = BLU(src);
  a_dst++ = ALP(src);

  src++;
}

delete[] a;
delete[] b;
delete[] g;
delete[] r;

And then everything I said before applies -- unroll loop, coalesce writes, drop to SSE/AVX.

Another thought -- also look into the restrict keyword and make sure your pointers are const-correct. Without restrict/const correctness, its possible (if not likely) that the compiler can't optimize this code, because it won't know whether your pointers alias each other or not.

throw table_exception("(? ???)? ? ???");

Adam_42

3,664

January 15, 2013 09:45 PM

Your best bet performance wise is probably to get the GPU to do as much of the work as possible. It should be fairly simple to write a shader that does the colour space conversion and outputs the data in the format you need. The only awkwardness is that there are no one byte per pixel render target formats, so you'll have to use RGBA and process four source pixels for each destination one (and ensure the source image is a multiple of 4 pixels wide).

In addition to that don't lock the texture on the same frame as you call GetRendertargetData() - double buffer it and you'll be blocking waiting for the GPU less often.

Efficiently obtaining Red Channel from BGRA Bitmap

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Efficiently obtaining Red Channel from BGRA Bitmap

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines