# OpenGL Getting depth values

## Recommended Posts

##### Share on other sites
i dont think feedback is what u want ( i can see it being a lot slower )
i just ran a very old benchmark of mine on my nvidia gf7600gs
im getting ~150million pixels sec with glReadPixels( GL_DEPTH );
ie 640x480 > 400fps

have u looked into PBO (theres info + a demo on the nvidia developer site)

##### Share on other sites
zeds,

That's an interesting result. When I remove my glReadPixels() call I get ~70fps, when I add it in framerate drops to ~12fps.

-If you take out your call to glReadPixels() what kind of performance increase do you get on your benchmark framerate i.e. is it anything like my jump of about 5x, above?

-Could you tell me how you're calling glReadPixels()? How many depth values does your benchmark code read per call. My code is below, I'm trying to take all ~300,000 depth values in the window at once.

Here's how I make my call:
float *fmem = malloc(640*480*sizeof(float));
glReadPixels(0, 0, 640, 480, GL_DEPTH_COMPONENT, GL_FLOAT, fmem);

I think the PBO idea is a good one, but I want to make sure of some things before I move on from glReadPixels(). My fps results above are based on a 1000 frame long test, where the 1000 glReadPixels() calls add 70 seconds in total, versus a run where they aren't called. That looks like under 5 million pixels per second coming back to the app.

-Could I be suffering from the lack of a decent graphics card here? Or perhaps I'm making my call to glReadPixels() incorrectly?

##### Share on other sites
Still don't know what graphics card I have in this machine. But using GPUBench I get the following results for glReadPixels() (http://graphics.stanford.edu/projects/gpubench/test_readback.html has details). They don't read GL_DEPTH_COMPONENT but I was still interested to see them (window size for the test is 512*512 by default):

Fixed Hostmem GL_RGBA Mpix/sec: 46.54 MB/sec: 177.53
Fixed Hostmem GL_ABGR_EXT Mpix/sec: 1.48 MB/sec: 5.66
Fixed Hostmem GL_BGRA Mpix/sec: 46.23 MB/sec: 176.36
Float Hostmem GL_RGBA Mpix/sec: 12.55 MB/sec: 191.48
Float Hostmem GL_ABGR_EXT Mpix/sec: 0.47 MB/sec: 7.11
Float Hostmem GL_BGRA Mpix/sec: 12.47 MB/sec: 190.22

I've looked at the GPUBench source code, and made some very slight changes to my glReadPixels() calls to bring my code in line with theirs. My performance is pretty much unchanged, however.

I think I will attempt to give feedback mode a try before I move on. I'll post if I conclude anything other than what zeds predicted above.

Regarding PBOs, I'm concerned that all they will give me is the potential for a non-blocking call to read the depth info. As I don't have much work I can give the app to do in the meantime (before I actually try to use the depth data), I don't think I have much chance of a performance increase. Quote from Dominik Göddeke's tutorial below might be interesting to anyone else considering this approach

"Conventional transfers require a pipeline stall on the GPU to ensure that the data being read back is synchronous with the state of computations. PBO-accelerated transfers are NOT able to change this behaviour, they are only asynchronous on the CPU side. This behaviour cannot be changed at all due to the way the GPU pipeline works. This means in particular that PBO transfers from the GPU will not deliver any speedup with the application covered in this tutorial, they might even be slower than conventional ones. They are however asynchronous on the CPU: If an application can schedule enough work between initiating the transfer and actually using the data, true asynchronous transfers are possible and performance might be improved in case the data format allows this. ... To benefit from PBO acceleration, a lot of independent work needs to be scheduled between initiating the transfer and requesting the data".

Full tutorial available at http://www.mathematik.uni-dortmund.de/~goeddeke/gpgpu/tutorial3.html

##### Share on other sites
Quote:
 Here's how I make my call:float *fmem = malloc(640*480*sizeof(float));glReadPixels(0, 0, 640, 480, GL_DEPTH_COMPONENT, GL_FLOAT, fmem);

i hope youre not doing that each frame ie declaring the memory.

my results are from an old benchmarking app i wrote many years ago (from memory even my gf2mx at the time did >10million pixs)

1000x readpixels of 640x480 GL_DEPTH_COMPONENT with GL_FLOAT should noway near take 70secs.

heres the output from my testing (as u can see depth values should be pretty close to color values)
thus if u have
Fixed Hostmem GL_RGBA Mpix/sec: 46.54 MB/sec: 177.53
u should be seeing something similar WRT depth (which youre not)
try removing everything except for the readpixels + see if thats truly the bottleneck

glReadPixels: DEPTH_COMPONENT -- UNSIGNED_BYTE 170.111 Mpixels/sec
glReadPixels: DEPTH_COMPONENT -- UNSIGNED_SHORT 170.111 Mpixels/sec
glReadPixels: DEPTH_COMPONENT -- FLOAT 145.572 Mpixels/sec
glReadPixels: DEPTH_COMPONENT -- UNSIGNED_INT 140.837 Mpixels/sec
glReadPixels: DEPTH_STENCIL_NV -- UNSIGNED_INT_24_8_NV 150.722 Mpixels/sec
---
glReadPixels: LUMINANCE -- UNSIGNED_BYTE 144.398 Mpixels/sec
glReadPixels: LUMINANCE -- UNSIGNED_SHORT 23.865 Mpixels/sec
glReadPixels: LUMINANCE -- UNSIGNED_INT 16.529 Mpixels/sec
glReadPixels: LUMINANCE -- FLOAT 25.871 Mpixels/sec
glReadPixels: ALPHA -- UNSIGNED_BYTE 186.673 Mpixels/sec
glReadPixels: ALPHA -- UNSIGNED_SHORT 184.746 Mpixels/sec
glReadPixels: ALPHA -- UNSIGNED_INT 184.746 Mpixels/sec
glReadPixels: ALPHA -- FLOAT 175.333 Mpixels/sec
glReadPixels: RED -- UNSIGNED_BYTE 171.744 Mpixels/sec
glReadPixels: RED -- UNSIGNED_SHORT 144.398 Mpixels/sec
glReadPixels: RED -- UNSIGNED_INT 119.305 Mpixels/sec
glReadPixels: RED -- FLOAT 150.722 Mpixels/sec
glReadPixels: RGB -- UNSIGNED_BYTE 141.954 Mpixels/sec
glReadPixels: BGR -- UNSIGNED_BYTE 163.580 Mpixels/sec
glReadPixels: RGBA -- UNSIGNED_BYTE 149.380 Mpixels/sec
glReadPixels: BGRA -- UNSIGNED_BYTE 165.191 Mpixels/sec
glReadPixels: RGB -- FLOAT 45.222 Mpixels/sec
glReadPixels: BGR -- FLOAT 46.668 Mpixels/sec
glReadPixels: RGB -- UNSIGNED_SHORT_5_6_5 154.718 Mpixels/sec
glReadPixels: RGB -- UNSIGNED_SHORT_5_6_5_REV 148.061 Mpixels/sec
glReadPixels: RGBA -- FLOAT 38.000 Mpixels/sec
glReadPixels: BGRA -- FLOAT 37.680 Mpixels/sec
glReadPixels: RGBA -- UNSIGNED_INT_8_8_8_8 166.834 Mpixels/sec
glReadPixels: BGRA -- UNSIGNED_INT_8_8_8_8 142.029 Mpixels/sec
glReadPixels: RGBA -- UNSIGNED_INT_8_8_8_8_REV 149.380 Mpixels/sec
glReadPixels: BGRA -- UNSIGNED_INT_8_8_8_8_REV 166.730 Mpixels/sec

##### Share on other sites
I'm not doing the malloc each frame. Sorry, that is misleading.
The good benchmark is what is leaving me so confused. I know you're right that it should be much faster. If I remove just the one glReadPixels() line _only_, then the 1000 frame run does indeed complete 70 seconds faster (about 15sec in total). There's something wrong here.

I found out yesterday that the card in this machine is an ATI EAX300SE 128Mb PCIe.

The only explanation I can come up with at the moment is an ATI driver problem for Linux. (Now I've said that it's bound to be me making a stupid coding mistake).

1) My benchmarks were indeed good, but they were run under windows.
2) I do all my OpenGL work in Debian Linux.
3) I have seen people mention ATI Linux driver problems on other forums, specifically mentioning glReadPixels() e.g. http://www.gpgpu.org/forums/viewtopic.php?t=3353&view=previous&sid=3f7fb23c04d396ca28cd5493ff624753

Don't know what the best next step is. I have an NVidia G-Force 6 Series 6600GT PCIe sitting on my desk but switching them over could be a problem as I don't own this machine. I've yet to look at whether any more recent ATI drivers are available.

##### Share on other sites
Found another PC running Debian Linux, very similar spec _but_ with an NVidia graphics card. I ran exactly the same code on both my PC (ATI card) and the alterative machine (NVidia card), results are below.

1000 frame test, duration:

ATI:
Window size 214*512: 1.23sec (readback on)

NVidia:
Window size 214*512: 10sec (readback on)

[The readback off cases aren't entirely fair as I also dropped a big array loop every frame, that I shouldn't have done. To give an idea, ATI would be 12sec with readback off and the array loop left in. So you could scale up the 4sec Nvidia result a little.]

But regardless of that, and the fact that I don't know what model the Nvidia card is - it appears faster in general rendering than the ATI... I'm sure that there is some problem with the ATI card's readback under Linux. See the jump up to 3 mins 32secs. An overhead of ~200 seconds. [I was wrong to quote an overhead of 70sec on readback for 1000*glReadPixels(0,0,640,480,...) in earlier posts. It was for 1000*glReadPixels(0,0,214,512,...).

Perhaps this could be helpful info if someone is struggling with slow glReadPixels() under Linux in the future.

##### Share on other sites
Is it possible to upload the video evidence to the GPU and do the comparison there instead? That would possibly yield an increase in speed.

##### Share on other sites
Jerax,
Yes, I think that's a nice idea. Looking at gpgpu.org the kind of techniques I'd need to employ for general purpose GPU computations look relatively tough (to me, at least) but I think you're right that it's the way to go for performance increases. I'll be testing the approach further using the readback technique for now, but if it's successful then I'll look again at this option.

Re. glReadPixels(), I've replaced my machine's ATI EAX300SE 128Mb PCIe with the NVidia G-Force 6 Series 6600GT 128Mb PCIe. The final result for my benchmark under Linux is now:

1000 frame test, duration:

NVidia 6600GT 128Mb PCIe:
Window size 640*512: 16sec (readback on)

This is manageable for my application.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
627708
• Total Posts
2978730
• ### Similar Content

• I want to make professional java 3d game with server program and database,packet handling for multiplayer and client-server communicating,maps rendering,models,and stuffs Which aspect of java can I learn and where can I learn java Lwjgl OpenGL rendering Like minecraft and world of tanks

• A friend of mine and I are making a 2D game engine as a learning experience and to hopefully build upon the experience in the long run.

-What I'm using:
C++;. Since im learning this language while in college and its one of the popular language to make games with why not.     Visual Studios; Im using a windows so yea.     SDL or GLFW; was thinking about SDL since i do some research on it where it is catching my interest but i hear SDL is a huge package compared to GLFW, so i may do GLFW to start with as learning since i may get overwhelmed with SDL.
-Questions
Knowing what we want in the engine what should our main focus be in terms of learning. File managements, with headers, functions ect. How can i properly manage files with out confusing myself and my friend when sharing code. Alternative to Visual studios: My friend has a mac and cant properly use Vis studios, is there another alternative to it?

• Both functions are available since 3.0, and I'm currently using glMapBuffer(), which works fine.
But, I was wondering if anyone has experienced advantage in using glMapBufferRange(), which allows to specify the range of the mapped buffer. Could this be only a safety measure or does it improve performance?
Note: I'm not asking about glBufferSubData()/glBufferData. Those two are irrelevant in this case.
• By xhcao
Before using void glBindImageTexture(    GLuint unit, GLuint texture, GLint level, GLboolean layered, GLint layer, GLenum access, GLenum format), does need to make sure that texture is completeness.
• By cebugdev
hi guys,
are there any books, link online or any other resources that discusses on how to build special effects such as magic, lightning, etc. in OpenGL? i mean, yeah most of them are using particles but im looking for resources specifically on how to manipulate the particles to look like an effect that can be use for games,. i did fire particle before, and I want to learn how to do the other 'magic' as well.
Like are there one book or link(cant find in google) that atleast featured how to make different particle effects in OpenGL (or DirectX)? If there is no one stop shop for it, maybe ill just look for some tips on how to make a particle engine that is flexible enough to enable me to design different effects/magic
let me know if you guys have recommendations.