Archived

This topic is now archived and is closed to further replies.

chbfiv

Speed up RGB image functions

Recommended Posts

void CImage::Fill(Color3ub color) {
	for(int i=0;i < img_size;i+=3) {
		Image.imageData[i] = color.r;
		Image.imageData[i+1] = color.g;
		Image.imageData[i+2] = color.b;
	}
}

void CImage::SetPixel(GLuint x,GLuint y,Color3ub color) {
	GLuint row = x*(Image.bpp/8);
	GLuint column = y*((Image.bpp/8)*Image.width);
		Image.imageData[column+row] = color.r;
		Image.imageData[column+row+1] = color.g;
		Image.imageData[column+row+2] = color.b;
}
I need SetPixel to be as fast as it can be, Fill is no biggie, im just sure someone would have a few pointers. x,y are a pixel on the image of course, Image is the Texture, bpp is the bits per pixel. please explain if you dont mind=)

Share this post


Link to post
Share on other sites
The biggest boost you''ll probably get by using 32-bit, even when you "waste" 8 bit (or use it for alpha).
The worst thing you can do is access the memory byte-wise for a pixel. You end up with 3 accesses for 1 pixel instead of 1.

Share this post


Link to post
Share on other sites
even better, if you have mmx and sse or equivelent, use them.
if you are using a PC, intel has a number of articles on writing fast gfx code, and amd has some fast blitting code. the intel articles and amd libriries can be of great interest even if you are not using that architecture because they reveal some interesting optimizations both mathematical and simd-orientated.

if you need to maintain portability, endurian''s suggestion is the best way.

Share this post


Link to post
Share on other sites
if you expect to write high performance code for this kind of operaration, set your compiler up to dump the asm, or use a debugger with an asm view, so you can see what is going on.
Always check that the compiler does the optimizations you think it does.
Use 32 bpp instead of 24, as it was mentioned.
SIMD code can speed things up to the point where your main concern is going to be mem bandwidth.
The Intel compiler can vectorize code and generate SSE for loops. Try it.

Share this post


Link to post
Share on other sites
A few suggestions that might help

1. Precompute values:
Precompute pixelSize = Image.bpp / 8;
Precompute rowSize = Image.bpp / 8 * Image.width;

This way you can rewrite

row = x*(Image.bpp/8);
column = y*((Image.bpp/8)*Image.width);

to

row = x * pixelSize;
column = y * rowSize;

2. Stop accessing the array as often.
Am not sure, but this might be faster. Instead of saying:
pixel = red;
pixel[i+1] = green;
pixel[i+2] = blue;

Store your RGB values as a longs (32 bit,m 0x00RRGGBB). Your pixel buffer can still be 24 bit. Using pointers instead of an array:

offset = row + column;
*(pixelData + offset) &= 0xFF000000;
*(pixelData + offset) |= color;

If you used 32 bit colors (as mentioned by janos) you could just do a straigt assignment.

3. Write a special Get/SetPixel function for each bit depth.
Create a SetPixel function for each color depth you support.
Set8BitPixel, Set16BitPixel, Set24BitPixel, etc.. Make your generic SetPixel a function pointer, and point it to the correct function when the image has been initialized. This will save you quite a few calculations. This will preserve your generic Set/GetPixel interface.

4. Don''t use a general interface.
The best thing to do would be to NOT use a SetPixel function. Access the memory directly as needed and optimize what you can.


Hope this was useful,
Will

Share this post


Link to post
Share on other sites
That was alot of help thanks,
This is what I have now, just if you want to see.

struct Color {
Color(GLubyte red=0x00,GLubyte green=0x00,GLubyte blue=0x00,GLubyte alpha=0x00) :r(red),g(green),b(blue),a(alpha),color(0)
{
GLuint tmp_red = r;
tmp_red = tmp_red << 16;
GLuint tmp_green = g;
tmp_green = tmp_green << 8;
color = a | tmp_red | tmp_green | b;
}
GLubyte r;
GLubyte g;
GLubyte b;
GLubyte a;
GLuint color;
};
void CImage::SetPixel(GLuint x,GLuint y,Color rgba) {
offset = (x*row) + (y*column);
*(Image.imageData + offset) &= 0xFF000000;
*(Image.imageData + offset) |= rgba.color;
}



Thanks a ton

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
just thought you might be interested in some old time x86 asm that operates on 32bit DIBs. this is from a source-to-dest paste function, but iirc you can replace the "rep movsd" with a "rep stosd" or something like that (sorry it''s been too long since i did x86 asm ) to do a fill instead of a paste. note that m_ vars are members of the Dib class:

void Dib::RectPaste( Dib *pSrcDib, int x, int y )
{
// Clip Rect
int ipx = (x >= 0) ? x : 0;
int ipy = (y >= 0) ? y : 0;
int idx = ((x + pSrcDib->m_szSize.cx) < m_szSize.cx) ? pSrcDib->m_szSize.cx : m_szSize.cx - x;
int idy = ((y + pSrcDib->m_szSize.cy) < m_szSize.cy) ? pSrcDib->m_szSize.cy : m_szSize.cy - y;

idx = (x >= 0) ? idx : idx + x;
idy = (y >= 0) ? idy : idy + y;

// Return if nothing to do
if( (idx <= 0) || (idy <= 0) ) return;

// Prepare buffer addresses
COLORREF *src = pSrcDib->m_pBits + ((ipy - y)*pSrcDib->m_szSize.cx) + ipx - x;
COLORREF *dst = m_pBits + (ipy*m_szSize.cx) + ipx;

int iws = pSrcDib->m_szSize.cx;
int iwd = m_szSize.cx;

#ifndef _WINDOWS
while( idy-- ) {
for( int i = 0; i < idx; i++ ) {
dst = src[i];
}
src += iws;
dst += iwd;
}
#else
__asm {
cld ; upward direction
mov ecx, idy ; pre-load # scan lines
mov ebx, iws ; pre-load source scan line width
shl ebx, 2 ; source scan line width in bytes
mov edx, iwd ; pre-load destination scan line width
shl edx, 2 ; destination scan line width in bytes
mov esi, src ; pre-load source address
mov edi, dst ; pre-load destination address
yloop: push ecx ; save # scan lines left to process
push esi ; save current source address
push edi ; save current destination address
mov ecx, idx ; # words / scan line
rep movsd ; move ''em from source to destination
pop edi ; restore current destination address
pop esi ; restore current source address
pop ecx ; restore # scan lines left to process
add esi, ebx ; point to next source scan line
add edi, edx ; point to next destination scan line
loop yloop ; loop until all scan lines processed
}
#endif
}

Share this post


Link to post
Share on other sites