Efficiently obtaining Red Channel from BGRA Bitmap

Started by
21 comments, last by Adam_42 11 years, 3 months ago
I'm pretty surprised too that you're measuring 5ms for copying that much data. Are you on some old hardware?

I got similar timings on my laptop, which is about one year old.
Advertisement

So, if you're not already, what you probably want to do in this case is modify your loop to compute all 4 R, G, B, and A arrays (I'll call these planes) -- I presume you'll need the Green and Blue channels at some point too, for YUV you may or may not need A (which I assume remains alpha).

It seems likely to me that the real bottleneck here is the copy from GPU to system memory -- by doing all 4 planes per loop iteration, you'll make efficient use of cache, and since the source array is already transfered, you aren't paying that penalty again. Whereas the red channel alone has a cost of around 6ms, I'd wager you can easily get the whole set for under 10.

Something like:


int size        = Height * Width;

bgra* src	= byte array of a BGRA formatted bitmap image;
bgra* end       = src + size;

byte* r		= new byte[size];
byte* g		= new byte[size];
byte* b		= new byte[size];
byte* a		= new byte[size];

byte* r_dst     = r;
byte* g_dst     = g;
byte* b_dst     = b;
byte* a_dst     = a;

while (src < end)
{
  r_dst++ = RED(src);
  g_dst++ = GRN(src);
  b_dst++ = BLU(src);
  a_dst++ = ALP(src);

  src++;
}

delete[] a;
delete[] b;
delete[] g;
delete[] r;

And then everything I said before applies -- unroll loop, coalesce writes, drop to SSE/AVX.

Another thought -- also look into the restrict keyword and make sure your pointers are const-correct. Without restrict/const correctness, its possible (if not likely) that the compiler can't optimize this code, because it won't know whether your pointers alias each other or not.

throw table_exception("(? ???)? ? ???");

Your best bet performance wise is probably to get the GPU to do as much of the work as possible. It should be fairly simple to write a shader that does the colour space conversion and outputs the data in the format you need. The only awkwardness is that there are no one byte per pixel render target formats, so you'll have to use RGBA and process four source pixels for each destination one (and ensure the source image is a multiple of 4 pixels wide).

In addition to that don't lock the texture on the same frame as you call GetRendertargetData() - double buffer it and you'll be blocking waiting for the GPU less often.

This topic is closed to new replies.

Advertisement