# glDrawPixels slow?!

## Recommended Posts

Hi everyone! I am using glDrawPixels to draw a 2D surface to the screen. But I am getting lacklustre performance. I didnt want to go through the 3D pipeline but if I cannot get the performance I want from it, I will utilize a textured quad (btw, the image has non-power of 2 dimensions, I will figure a way, I need it to be stretched anyway). I am using it in the following fashion:
glDrawPixels(256, 192, GL_BGRA_EXT, GL_UNSIGNED_BYTE, surfs[index].psurfacemem);

I used both a Radeon 9550 (w/ 9600 BIOS) and a SuperSavage 16MB on my T23 and the performance was bad on both of them (more so on the SuperSavage lol but I kinda expected that haha). Both of them looked as if they were scaling and drawing the image in software rather than using the hardware (a big indicator was the lack of filtering). Could this be because I am using the GL_BGRA_EXT ? The way I am generating the image doesnt allow me to use one of the non-extensions modes. If someone can offer some aid on how to set it up my pixel format is a DWORD in the following format : 0x00RRGGBB (appears as bytes like 0xBB 0xGG 0xRR 0x00). Also if I am forced to go the textured quad way how would I be able to get the image onto the screen properly without doing some scaling in software (which will definitely kill performance ) Thanks a bunch in advance :D

##### Share on other sites
Heis man.

As I have read somewhere (openGL red book? maybe!), the functions that manipulate pixels (like drawing and copying) are VERY slow.
The possibility is really a screen aligned quad with a power of two texture.

About the other question, I didn't undertand it well, but, you can scale, translate, rotate, skew and dow whatever you want with the qaud since it is 2 textures and will use the hardware to rasterize them.

##### Share on other sites
try compiling the glDrawPixels command into a dsiplay list . This will increase performance
if the images are not to be changed frequently. If u do want to change the image u will have to rebuild the display list

##### Share on other sites
Been looking at this myself recently, although I found performance pretty good for just applying to the colourbuffers (drawing to the depth buffer was horrendously slow < 0.1 fps).

Whilst reading up on it I came across the fact that it still goes thorugh the basic 3d pipeline, meaning that it does not pixel fragrement ops on it. I can't find the source now, but i'm positive it was suggest to disable states like the depthbuffer, alphatesting etc before calling glDrawpixels to improve performance.

Can't say for sure if these actually have an impact, only just started looking into it, but from the msdn opengl docs I found this

'These pixel fragments are then treated just like the fragments generated by rasterizing points, lines, or polygons. The glDrawPixels function applies texture mapping, fog, and all the fragment operations before writing the fragments to the framebuffer.'

As to using a screen aligned quad, you could look into the Non-Power of 2 texture extension.

[Edited by - noisecrime on August 31, 2005 9:36:11 AM]

##### Share on other sites
from http://www.opengl.org/resources/faq/technical/performance.htm:

"Why are glDrawPixels() and glReadPixels() so slow?

[...]

First, all glPixelTransfer() state should be set to their default values. Also, glPixelStore() should be set to its default value, with the exception of GL_PACK_ALIGNMENT and GL_UNPACK_ALIGNMENT (whichever is relevant), which should be set to 8. Your data pointer will need to be correspondingly double- word aligned.

Second, examine the parameters to glDrawPixels() or glReadPixels(). Do they correspond to the framebuffer layout? Think about how the framebuffer is configured for your application. For example, if you know you're rendering into a 24-bit framebuffer with eight bits of destination alpha, your type parameter should be GL_RGBA, and your format parameter should be GL_UNSIGNED_BYTE. If your type and format parameters don't correspond to the framebuffer configuration, it's likely you'll suffer a performance hit due to the per pixel processing that's required to translate your data between your parameter specification and the framebuffer format.

Finally, make sure you don't have unrealistic expectations. Know your system bus and memory bandwidth limitations."

I suggest you try GL_RGBA instead of GL_BGRA_EXT to see if this is the cause of the sluggish performance.

##### Share on other sites
glDrawPixels will always be slow, no matter what format you're using. The reason is that it requires the image data to be transferred from RAM(the *pixels parameter) to the video card. On the other hand, using a textured quad means that the texture is always loaded in the VRAM, and so it's much faster. You don't need to learn everything about the 3D pipeline if you want just 2D capabilites, you just set the correct orthographic projection and deal with 2D coordinates only.

##### Share on other sites
Doesnt compiling glDrawPixels into a display list improve performace ??

##### Share on other sites
no, as its not an execution time problem its a data readback and pipeline issue.

Display lists arent a magic fix for everything in OpenGL.

##### Share on other sites
Does anyone here knows, how could I efficiently save/restore power-of-two depth/stencil screen rectangle?
Don't ask me, why do I need this!!! ))))
The problem is how I can efficiently save/restore depth and stencil info. I tried to accomplish it with a help of ReadPixels/DrawPixels, taking into account all the issues, found in specification and that, what you were talking about, but perfomance is still very poor.

--- edit
Oops! I found, what I need! WGL_ARB_buffer_region - very helpful extension, if you have similar problem. It is much faster, than ReadPixels/DrawPixels, because of lack in VideoCard/RAM transfers, fortunately, no transfers at all.

[Edited by - Jackis on September 1, 2005 11:53:21 AM]

##### Share on other sites
Quote:
 Original post by JackisOops! I found, what I need! WGL_ARB_buffer_region - very helpful extension, if you have similar problem. It is much faster, than ReadPixels/DrawPixels, because of lack in VideoCard/RAM transfers, fortunately, no transfers at all.

Apart from the fact that its not support on a large range of ATI cards.
• Quote:
 Original post by noisecrimeApart from the fact that its not support on a large range of ATI cards.

You're right, but we have GL_KTX_buffer_region for such a cards (http://www.delphi3d.net/hardware/extsupport.php?extension=GL_KTX_buffer_region). It is the same extension, but it is "hidden" (as I found, it was created long time ago for 3DSMAX and so on). nvoglnt.dll has pretty the same procedures in it:
glBufferRegionEnabled
glDrawBufferRegion
glDeleteBufferRegion
glNewBufferRegion
So, for ATI cards we can do the same thing.

##### Share on other sites
Nice one Jackis,

Must be one of the better kept secrets of opengl, becuase i'd never heard of it, even after searching for information about gldrawpixels and WGL_ARB_buffer_region, and I rarely get that far down the extensions list at delphi3d ;)

I tried to find the spec on delphi3d, but it links to a dead page. Is it exactly equivilant to WGL_ARB_buffer_region? Do you have a copy of the spec around, that you could post?

In your experience can you

a. Just use GL_KTX_buffer_region for all cards?
b. Do you think it will always be avaialable? I'm weary of using a specific extension if its not ARB, although the usage by something like 3dsmax would suggest it wil be support for the future.

##### Share on other sites
That's what I've found on this extension (thnx to google! )))
As it is described down below, this extension is not equivalently similar to the ARB one, for example, we can't create one buffer region object to restore color and depth/stencil info simultaneously.

http://www.west.net/~brittain/3dsmax2.htm#OpenGL%20Buffer%20Region%20Extension

OpenGL Buffer Region Extension

The OpenGL extension described below, if present, will be used by MAX to implement dual planes under OpenGL. As with all OpenGL extensions under Windows NT, the functions are imported into MAX by calling wglGetProcAddress, and the functions themselves are implemented with the __stdcall calling convention. The presence of this extension is indicated by the keyword "GL_KTX_buffer_region" being present in the string returned by glGetString(GL_EXTENSIONS).

In an optimal implementation of this extension, the buffer regions are stored in video RAM so that buffer data transfers do not have to cross the system bus. Note that no data in the backing buffers is ever interpreted by MAX – it is just returned to the active image and/or Z buffers later to restore a partially rendered scene without having to actually perform any rendering. Thus, the buffered data should be kept in the native display card format without any translation.

GLuint glNewBufferRegion(GLenum type)

This function creates a new buffer region and returns a handle to it. The type parameter can be one of GL_KTX_FRONT_REGION, GL_KTX_BACK_REGION, GL_KTX_Z_REGION or GL_KTX_STENCIL_REGION. These symbols are defined in the MAX gfx.h header file, but they are simply mapped to 0 through 3 in the order given above. Note that the storage of this region data is implementation specific and the pixel data is not available to the client.

void glDeleteBufferRegion(GLuint region)

This function deletes a buffer region and any associated buffer data.

void glReadBufferRegion(GLuint region, GLint x, GLint y, Glsizei width, GLsizei height)

This function reads buffer data into a region specified by the given region handle. The type of data read depends on the type of the region handle being used. All coordinates are window-based (with the origin at the lower-left, as is common with OpenGL) and attempts to read areas that are clipped by the window bounds fail silently. In MAX, x and y are always 0.

void glDrawBufferRegion(GLuint region, GLint x, GLint y, Glsizei width, GLsizei height, GLint xDest, GLint yDest)

This copies a rectangular region of data back to a display buffer. In other words, it moves previously saved data from the specified region back to its originating buffer. The type of data drawn depends on the type of the region handle being used. The rectangle specified by x, y, width, and height will always lie completely within the rectangle specified by previous calls to glReadBufferRegion. This rectangle is to be placed back into the display buffer at the location specified by xDest and yDest. Attempts to draw sub-regions outside the area of the last buffer region read will fail (silently). In MAX, xDest and yDest are always equal to x and y, respectively.)

GLuint glBufferRegionEnabled(void)

This routine returns 1 (TRUE) if MAX should use the buffer region extension, and 0 (FALSE) if MAX shouldn't. This call is here so that if a single display driver supports a family of display cards with varying functionality and onboard memory, the extension can be implemented yet only used if a given display card could benefit from its use. In particular, if a given display card does not have enough memory to efficiently support the buffer region extension, then this call should return FALSE. (Even for cards with lots of memory, whether or not to enable the extension could be left up to the end-user through a configuration option available through a manufacturer's addition to the Windows tabbed Display Properties dialog. Then, those users who like to have as much video memory available for textures as possible could disable the option, or other users who work with large scene databases but not lots of textures could explicitly enable the extension.)

Notes:

Buffer region data is stored per window. Any context associated with the window can access the buffer regions for that window. Buffer regions are cleaned up on deletion of the window.

MAX uses the buffer region calls to squirrel away complete copies of each viewport’s image and Z buffers. Then, when a rectangular region of the screen must be updated because "foreground" objects have moved, that subregion is moved from "storage" back to the image and Z buffers used for scene display. MAX then renders the objects that have moved to complete the update of the viewport display.

##### Share on other sites
Ooops, guys, sorry, the last message by Anonimous Poster was mine ))) I've forgot to log in ))

##### Share on other sites
Thanks for the info, its something to consider.

Sadly having checked my opengl extensions it doesn't appear to be available for my Radeon9800 using the latest 5.8 drivers, dispite what it says at Delphi3D.

##### Share on other sites
DrawPixels performance is not optimal, but it shouldn't be extremely slow. Please help out a bit by answering the following questions:

Are you using pixelzoom? Mentioning scaling the image indicates to me that you are.
Can you change the format that you're using? Switching formats may eliminate the need for CPU side processing.
Does the pixel data change often (or does it never change?)
Do you need to support very old systems (like the SuperSavage)?

As for saving and restoring depth and stencil data, that's more difficult. Ideally, you want to keep the data on the GPU, without having to read it back to the CPU at all. This isn't really possible with existing core OpenGL functions (glCopyTexImage2D can only read from color buffers). At first glance, it may be possible using FBOs, but that will not be supportable on the older hardware.

##### Share on other sites
RichardS
Yes, you're absolutely right about glDrawPixels issues with pixel zoom etc.
Now we are discussing a probable solution to avoid the impossibility of saving/restoring depth and stencil, and, as for me, I've already implemented this by using WGL_ARB_buffer_region extension (btw, it is _much_ faster, that draw/readpixels realisation on my 6800GT). The problem is that this extension is not supported everywhere. So, we are trying to know smth more about elder extension - GL_KTX_buffer_region, which is "secret" OpenGL ext ))), and, unfortunately, noisecrime said, that his Radeon doesn't support it also.

##### Share on other sites
Both DrawPixels performance and the depth/stencil problem can be solved portably using ARB_pixel_buffer_object.

For glDrawPixels: Replace the DrawPixels with a textured quad. Use PBO for async streaming updates when the data changes. If it doesn't change, then PBOs are unnecessary, and it can be done easily with texturing alone. This will also give better sampling.

For Depth/Stencil saving/restoration, create a buffer object with usage=STREAM_COPY_ARB, and use ReadPixels and DrawPixels to and from that.

##### Share on other sites
Using the ARB_DEPTH_TEXTURE extension, you can copy depth data using glCopyTexImage2D.

##### Share on other sites
OK with the help of you guys I have come up with a solution (hopefully this will work as I havent tried it yet.). I am programming SMS / Genesis (and hopefully a CPS2 or Neo Geo) emulator. My SMS and Genesis emulators are working fine but as you know I need a non-power of two texture to output the rendered images (frames). Now the reason I want to use OpenGL instead of DirectDraw for image scaling is that my SuperSavage doesnt support BLTSTRETCH in hardware. I used to have a Savage4 but I didnt do much DDraw when I had it and I dont know if it is the same.
My Radeons 9100, 9550, 9600XT (since I have written this article I upgraded to a 9600XT) have no problem with DirectDraw .
But my SuperSavage can perfectly stretch an image as a texture. So here is what I decided to do.

Resolutions I want to support : 320x240, 320x224, 256x192 , (and the CPS2 or Neo Geo resolution which is fairly close to one of those listed).

1 . Buffer the image to the closest power of two dimensions (eg. 256x192 is rendered to a 256x256 surface, or 320x240 rendered to a 512x256 surface).

2. Create a textured quad in an orthographic projection matrix. (this quads co-ords will not change until the window is resized).

3. Using the texture matrix, I can centre the image portion of the texture on the quad and scale it to fit the entire quad. (Can I use glScale to scale the texture image, I believe I can but has anyone else tried it?)

4. Apply the texture onto the quad and hopefully it will be OK.

I will have to update the texture data 60 times a second which will consume no more than 60 - 80MB/sec on the bus. I can make changes to my rendering code to generate plain RGBA instead of BGRA. I also want to do some neat effects with it while the emulator is paused so it would be easier to do it in the 3D realm.

I will also try to use a compiled vertex array instead of glVertex to pass the data since I wont be changing the vertex data.

What do you guys think of my method? Could it work? Any changes you would make? I would like your input on it.

Thanks a lot guys.

PS: I believe the co-founders of this site wrote a book called Beginning OpenGL programming. I bought it and find it a very handy referece. Thanks for the great book.

##### Share on other sites
No comments on my particular method? I will try it some time soon and let you know of the results, I can either post here or create a new thread whichever you guys prefer. Thanks.

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account

• ### Forum Statistics

• Total Topics
627737
• Total Posts
2978873

• 10
• 10
• 21
• 14
• 12