Jump to content

  • Log In with Google      Sign In   
  • Create Account


Efficiently obtaining Red Channel from BGRA Bitmap


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
22 replies to this topic

#21 Álvaro   Crossbones+   -  Reputation: 12919

Like
1Likes
Like

Posted 15 January 2013 - 12:26 PM

I'm pretty surprised too that you're measuring 5ms for copying that much data. Are you on some old hardware?

I got similar timings on my laptop, which is about one year old.

Edited by Álvaro, 15 January 2013 - 12:27 PM.


Sponsor:

#22 Ravyne   Crossbones+   -  Reputation: 7120

Like
2Likes
Like

Posted 15 January 2013 - 01:31 PM

So, if you're not already, what you probably want to do in this case is modify your loop to compute all 4 R, G, B, and A arrays (I'll call these planes) -- I presume you'll need the Green and Blue channels at some point too, for YUV you may or may not need A (which I assume remains alpha).

 

It seems likely to me that the real bottleneck here is the copy from GPU to system memory -- by doing all 4 planes per loop iteration, you'll make efficient use of cache, and since the source array is already transfered, you aren't paying that penalty again. Whereas the red channel alone has a cost of around 6ms, I'd wager you can easily get the whole set for under 10.

 

Something like:

int size        = Height * Width;

bgra* src	= byte array of a BGRA formatted bitmap image;
bgra* end       = src + size;

byte* r		= new byte[size];
byte* g		= new byte[size];
byte* b		= new byte[size];
byte* a		= new byte[size];

byte* r_dst     = r;
byte* g_dst     = g;
byte* b_dst     = b;
byte* a_dst     = a;

while (src < end)
{
  r_dst++ = RED(src);
  g_dst++ = GRN(src);
  b_dst++ = BLU(src);
  a_dst++ = ALP(src);

  src++;
}

delete[] a;
delete[] b;
delete[] g;
delete[] r;

 

And then everything I said before applies -- unroll loop, coalesce writes, drop to SSE/AVX.

 

Another thought -- also look into the restrict keyword and make sure your pointers are const-correct. Without restrict/const correctness, its possible (if not likely) that the compiler can't optimize this code, because it won't know whether your pointers alias each other or not.



#23 Adam_42   Crossbones+   -  Reputation: 2457

Like
0Likes
Like

Posted 15 January 2013 - 03:45 PM

Your best bet performance wise is probably to get the GPU to do as much of the work as possible. It should be fairly simple to write a shader that does the colour space conversion and outputs the data in the format you need. The only awkwardness is that there are no one byte per pixel render target formats, so you'll have to use RGBA and process four source pixels for each destination one (and ensure the source image is a multiple of 4 pixels wide).

 

In addition to that don't lock the texture on the same frame as you call GetRendertargetData() - double buffer it and you'll be blocking waiting for the GPU less often.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS