Speed up RGB image functions

Started by
7 comments, last by chbfiv 20 years, 6 months ago

void CImage::Fill(Color3ub color) {
	for(int i=0;i < img_size;i+=3) {
		Image.imageData[i] = color.r;
		Image.imageData[i+1] = color.g;
		Image.imageData[i+2] = color.b;
	}
}

void CImage::SetPixel(GLuint x,GLuint y,Color3ub color) {
	GLuint row = x*(Image.bpp/8);
	GLuint column = y*((Image.bpp/8)*Image.width);
		Image.imageData[column+row] = color.r;
		Image.imageData[column+row+1] = color.g;
		Image.imageData[column+row+2] = color.b;
}
I need SetPixel to be as fast as it can be, Fill is no biggie, im just sure someone would have a few pointers. x,y are a pixel on the image of course, Image is the Texture, bpp is the bits per pixel. please explain if you dont mind=)
-BourkeIV
Advertisement
The biggest boost you''ll probably get by using 32-bit, even when you "waste" 8 bit (or use it for alpha).
The worst thing you can do is access the memory byte-wise for a pixel. You end up with 3 accesses for 1 pixel instead of 1.

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

even better, if you have mmx and sse or equivelent, use them.
if you are using a PC, intel has a number of articles on writing fast gfx code, and amd has some fast blitting code. the intel articles and amd libriries can be of great interest even if you are not using that architecture because they reveal some interesting optimizations both mathematical and simd-orientated.

if you need to maintain portability, endurian''s suggestion is the best way.
The best way to do it is not to have your own SetPixel routine at all, but just do whatever drawing directly from the routine you''re running

- JQ
~phil
The use of pointers in the Fill function would improve performance as well.

-------
The problem with common sense is that it is never common.
-------homepage - email
if you expect to write high performance code for this kind of operaration, set your compiler up to dump the asm, or use a debugger with an asm view, so you can see what is going on.
Always check that the compiler does the optimizations you think it does.
Use 32 bpp instead of 24, as it was mentioned.
SIMD code can speed things up to the point where your main concern is going to be mem bandwidth.
The Intel compiler can vectorize code and generate SSE for loops. Try it.
Act of War - EugenSystems/Atari
A few suggestions that might help

1. Precompute values:
Precompute pixelSize = Image.bpp / 8;
Precompute rowSize = Image.bpp / 8 * Image.width;

This way you can rewrite

row = x*(Image.bpp/8);
column = y*((Image.bpp/8)*Image.width);

to

row = x * pixelSize;
column = y * rowSize;

2. Stop accessing the array as often.
Am not sure, but this might be faster. Instead of saying:
pixel = red;
pixel[i+1] = green;
pixel[i+2] = blue;

Store your RGB values as a longs (32 bit,m 0x00RRGGBB). Your pixel buffer can still be 24 bit. Using pointers instead of an array:

offset = row + column;
*(pixelData + offset) &= 0xFF000000;
*(pixelData + offset) |= color;

If you used 32 bit colors (as mentioned by janos) you could just do a straigt assignment.

3. Write a special Get/SetPixel function for each bit depth.
Create a SetPixel function for each color depth you support.
Set8BitPixel, Set16BitPixel, Set24BitPixel, etc.. Make your generic SetPixel a function pointer, and point it to the correct function when the image has been initialized. This will save you quite a few calculations. This will preserve your generic Set/GetPixel interface.

4. Don''t use a general interface.
The best thing to do would be to NOT use a SetPixel function. Access the memory directly as needed and optimize what you can.


Hope this was useful,
Will
------------------http://www.nentari.com
That was alot of help thanks,
This is what I have now, just if you want to see.
struct Color {	Color(GLubyte red=0x00,GLubyte green=0x00,GLubyte blue=0x00,GLubyte alpha=0x00) :r(red),g(green),b(blue),a(alpha),color(0) 	{		GLuint tmp_red = r;		tmp_red = tmp_red << 16;		GLuint tmp_green = g;		tmp_green = tmp_green << 8;		color = a | tmp_red | tmp_green | b;	}	GLubyte r;	GLubyte g;	GLubyte b;	GLubyte a;	GLuint color;};void CImage::SetPixel(GLuint x,GLuint y,Color rgba) {	offset = (x*row) + (y*column);	*(Image.imageData + offset) &= 0xFF000000;	*(Image.imageData + offset) |= rgba.color;}

Thanks a ton
-BourkeIV
just thought you might be interested in some old time x86 asm that operates on 32bit DIBs. this is from a source-to-dest paste function, but iirc you can replace the "rep movsd" with a "rep stosd" or something like that (sorry it''s been too long since i did x86 asm ) to do a fill instead of a paste. note that m_ vars are members of the Dib class:
void Dib::RectPaste( Dib *pSrcDib, int x, int y ){    // Clip Rect    int ipx = (x >= 0) ? x : 0;    int ipy = (y >= 0) ? y : 0;    int idx = ((x + pSrcDib->m_szSize.cx) < m_szSize.cx) ? pSrcDib->m_szSize.cx : m_szSize.cx - x;    int idy = ((y + pSrcDib->m_szSize.cy) < m_szSize.cy) ? pSrcDib->m_szSize.cy : m_szSize.cy - y;    idx = (x >= 0) ? idx : idx + x;    idy = (y >= 0) ? idy : idy + y;    // Return if nothing to do    if( (idx <= 0) || (idy <= 0) ) return;    // Prepare buffer addresses    COLORREF *src = pSrcDib->m_pBits + ((ipy - y)*pSrcDib->m_szSize.cx) + ipx - x;    COLORREF *dst = m_pBits + (ipy*m_szSize.cx) + ipx;    int iws = pSrcDib->m_szSize.cx;    int iwd = m_szSize.cx;#ifndef _WINDOWS    while( idy-- ) {        for( int i = 0; i < idx; i++ ) {            dst = src;<br>        }<br>        src += iws;<br>        dst += iwd;<br>    }<br>#else<br>    __asm {<br>                cld             ; upward direction<br>                mov  ecx, idy   ; pre-load # scan lines<br>                mov  ebx, iws   ; pre-load source scan line width<br>                shl  ebx, 2     ; source scan line width in bytes<br>                mov  edx, iwd   ; pre-load destination scan line width<br>                shl  edx, 2     ; destination scan line width in bytes<br>                mov  esi, src   ; pre-load source address<br>                mov  edi, dst   ; pre-load destination address<br>        yloop:  push ecx        ; save # scan lines left to process<br>                push esi        ; save current source address<br>                push edi        ; save current destination address<br>                mov  ecx, idx   ; # words / scan line<br>                rep  movsd      ; move ''em from source to destination<br>                pop  edi        ; restore current destination address<br>                pop  esi        ; restore current source address<br>                pop  ecx        ; restore # scan lines left to process<br>                add  esi, ebx   ; point to next source scan line<br>                add  edi, edx   ; point to next destination scan line<br>                loop yloop      ; loop until all scan lines processed<br>    }<br>#endif<br>}<br> </pre>   </i>  

This topic is closed to new replies.

Advertisement