Back to General and Gameplay Programming

Efficiently obtaining Red Channel from BGRA Bitmap

General and Gameplay Programming Programming

Started by gpu_noob January 13, 2013 10:00 AM

21 comments, last by Adam_42 11 years, 3 months ago

gpu_noob

114

Author

January 13, 2013 10:00 AM


byte* bgra	= byte array of a BGRA formatted bitmap image;
byte* r		= new byte[Height*Width];
			
for (int i = 0; i < Height; i++)
{
	for (int j = 0; j < Width; j++)
	{
		int offset = i*Width + j;
		r[offset] = bgra[offset*4 + 2];
	}
}
delete[] r;

I'm using the above code to obtain red channel values from a byte array of a BGRA bitmap image. The image is formatted as:

B G R A B G R A... (Size of W*H*4)

I want to obtain a byte array of

R R R R... (Size of W*H)

Is there a more efficient way of doing this without using for loops?

Zaoshi Kaba

8,470

January 13, 2013 10:06 AM

You could probably eliminate that *4 but that's insignificant.

It's possible to achieve same using shaders: output red color into single channel render target, but latency will kill any performance you gained.

C0lumbo

4,415

January 13, 2013 10:10 AM


byte* bgra	= byte array of a BGRA formatted bitmap image;
byte* r		= new byte[Height*Width];
byte *rSource = bgra+2;
int iPixels = Height*Width;

for (int i=0;i<iPixels;i++,r++,rSource+=4)
{    
    *r = *rSource;
}

I'd probably do something like this. I doubt it'd make much difference in the grand scheme of things.

Rapture - World Conquest

gpu_noob

114

Author

January 13, 2013 03:50 PM

You could probably eliminate that *4 but that's insignificant.

It's possible to achieve same using shaders: output red color into single channel render target, but latency will kill any performance you gained.

Would it be possible to somehow transfer the single channel rendertargetdata to system memory via with d3d10 copyresource or d3d9 getrendertargetdata? I need to be able to access the Red channel on CPU.

Zaoshi Kaba

8,470

January 13, 2013 04:29 PM

I'm afraid not. It copies whole/part of resource and doesn't pick individual bytes. You'd have to render quad with pixel shader, then copy red render target into RAM to have CPU access.

gpu_noob

114

Author

January 13, 2013 05:58 PM

I'm afraid not. It copies whole/part of resource and doesn't pick individual bytes. You'd have to render quad with pixel shader, then copy red render target into RAM to have CPU access.

I'm not sure what you mean by copy red rendertarget to RAM. I thought the rendertargets have to be 32-bit aligned. Is there an example of how extract only red channels from rendertarget texture using pixel shaders?

alvaro

21,604

January 13, 2013 06:23 PM

Is there a more efficient way of doing this without using for loops?

You seem to be under the [incorrect] assumption that for loops are somehow slow. Chances are that code is perfectly fast. You might be able to save a bit in the pointer arithmetic, since you don't really need to compute the offset from scratch each time: It's just one more than the value it was in the previous iteration of the loop, so you can do it with a counter. But even that probably won't matter much.

You should generally only worry about performance when you have evidence that this operation is taking too much time in your program.

iMalc

2,466

January 14, 2013 06:28 AM

byte* bgra	= byte array of a BGRA formatted bitmap image;
byte* r		= new byte[Height*Width];
byte *rSource = bgra+2;
int iPixels = Height*Width;

for (int i=0;i<iPixels;i++,r++,rSource+=4)
{    
    *r = *rSource;
}
I'd probably do something like this. I doubt it'd make much difference in the grand scheme of things.

You can go further than that, 'i' is not needed:

byte *bgra    = <byte array of a BGRA formatted bitmap image>;
byte *r       = new byte[Height*Width];
byte *rbegin  = r;
byte *rend    = r + Height*Width
byte *rSource = bgra+2;

while (rbegin < rend)
{    
    *rbegin++ = *rSource;
    rSource += 4;
}

"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

alvaro

21,604

January 14, 2013 03:10 PM

I just tried all of the solutions given above, and they have the exact same performance. So just write whatever is easiest to read. I personally would write this:

  byte *bgra = <byte array of a BGRA formatted bitmap image>;
  byte *r = new byte[Height*Width];

  for (int i=0; i<Height*Width; ++i)
    r[i] = bgra[4*i+2];

Ravyne

14,306

January 14, 2013 03:57 PM

All of the vanilla C++ that's been posted is about as efficient as you're going to get.

However, if you can prove that this is still a bottleneck for you, you could further try:

Pre-warm the cache by reading ahead (depends on cache-line size, but probably 8 or 16 source pixels)

Unroll loop x4 (read), coalesce writes (need to add some code to deal with non-multiple-of-4 source data).

Drop down to SSE or AVX assembly/intrinsics (coalesce more writes, using shuffle instructions)

I would try those things in that order, but remember -- fast for fast's sake is a silly goal unless its an academic exercise; In "the real world" the best solution is usually the simplest one which is fast enough. Optimizing without profiling is the coding equivalent of shooting first and asking questions later.

throw table_exception("(? ???)? ? ???");

Efficiently obtaining Red Channel from BGRA Bitmap

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Efficiently obtaining Red Channel from BGRA Bitmap

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines