Sign in to follow this  
Dybbuk

OpenGL Videocard readbuffer performance

Recommended Posts

I'm working on a project wich requires the readback of large buffers. I once wrote a realtime raytracer using openGL for first hit optimisation among some others things. To do this i had to readback the framebuffer using glReadpixels. Unfortunately the download speed was reached on my gf4. At 1024x768 it was hardly realtime anymore. I looked around and read that the readback is slow. For my current project i need to read back large panorama views, wich are much larger than 1024x768. So my questions are: 1. Are there any cards wich have suitable for this problem. 2. Is there any resource on the internet about this (specs?) 3. any other way?

Share this post


Link to post
Share on other sites
Disclaimer: I'm pretty sure this is accurate but not 100%, so please feel free to correct me if I'm wrong.

It's not really the card itself that makes reading back a framebuffer or whatever so slow although it does stall the pipeline. The slowest part is that AGP is terrible at sending stuff to system ram. PCI express is slowly becoming more widespread, which is much faster for both sending to vram and system ram.

Share this post


Link to post
Share on other sites
Your best bet is to start getting into pixel shader 3.0, this should give you access to being able to start a raytracer in a pixel shader ... with this link there is an example from nvidia named "Raytrace" that can give you a head start (scroll down the page).

http://developer.nvidia.com/object/fx_composer_shaders.html

[edit] If you are looking for a cheap 3.0 video card Nvidia just came out with a 6200 model which is a cheap low-end 3.0 ps/vs video card. it comes in AGP/PCI-EXPRESS.

Share this post


Link to post
Share on other sites
Thanks for your reply but,
The things about that raytracer where just a background story to introduce my problem. I need to readback a large buffer for a new project. I did'nt start coding yet, but i expect to run into some problems because of my earlier experience.

It's for a commercial project, so i'm not really on a tight budget, but i will need some numbers or specs to make my boss buy one.

[EDIT added:]
I think readback bandwidth of the AGP bus is kindda slow, but most cards can't fill it by far, but i don't have any numbers.
So if anybody knows the exact readback bandwidth speed of the AGP bus and PCI express, plz let me know
(Googling for it myself atm)
[/EDIT]

Share this post


Link to post
Share on other sites
From my experience, i WRITE 175 screens of 640*480 per second on a geforce2 with AGP2x. In 1024*768, if i remember, i obtain 50-60. (note that quake2 in software run at 45fps in 1024*768 with an Athlon 1.5Ghz on the same bus ;)
I think read operations are similar in term of speed. If no, they are less fast.

Hope that help you :)

Share this post


Link to post
Share on other sites
see GL_EXT_pixel_buffer_object,
the problem is what do u want to do with the data, if its quite simple (and can run in a fragment program then u could just copy the buffer into a texture)

Share this post


Link to post
Share on other sites
Quote:
Original post by zedzeek
see GL_EXT_pixel_buffer_object,
the problem is what do u want to do with the data, if its quite simple (and can run in a fragment program then u could just copy the buffer into a texture)


Pixel Shader 3.0 uses Multiple Render Targets, so you could easily use this to perform multiple tasks within the fragment/pixel shader all in one pass. You could even do very complex stuff. I haven't tried it because I sorta stuck to PS 2.0, but i wan't to move to 3.0 for this very reason, store info on multiple render targets and read output via cpu/gpu. And now that 3.0 will become main stream in probably a year (once new consoles come out), then its something to start programming with now. This is something I would seriously consider, especially if you want to manipulate pixels.

Share this post


Link to post
Share on other sites
Quote:
Pixel Shader 3.0 uses Multiple Render Targets, so you could easily use this to perform multiple tasks within the fragment/pixel shader all in one pass.
MRTs are supported with PS 2.0 aswell, but GeForce FX does not support it. ATi does.

Share this post


Link to post
Share on other sites
@Fruny
I know most card aren't build for it, but i really NEED to read back those pixels.

@zedzeek
GL_EXT_pixel_buffer_object can save the data on the card, but how do i get it in CPU ram or on disk??

@all
thx for helping so far

Share this post


Link to post
Share on other sites
Your best bet is a PCI-Express card. The main difference between AGP and PCIe is that PCIe has the same bandwidth for writing as well as reading.

Unfortunatly I dont know of any benchmarks for this type of thing since its reletivly new use of the GPU. Most drivers and cards are not optimized for this. agp cards would almost be suitable if it were not for the poor drivers/hardware.

On my 6800 PCIe card i get about 600MB/sec throughput using floating point buffers. This is about 39Mpixels/sec or 148 frames per second assuming a 512x512 area. The benchmark does nothing else but read from the buffer. This may or may not be useful information since this result can change based on drivers and on what you have the card do. benchmark here (http://graphics.stanford.edu/projects/gpubench/)

Also going up in buffer size (ie 1024 or 2048) did not change the throuput at all. Thus 74 frames per second at 1024x1024 or 37 frames per second at 2048x2048.

Warning teh above benchmarl uses glReadPixels() which may or may not be the most optimal way to read the buffer. There may be extensions that are more optimized, or possibly d3d methods that are faster. I only warn you because this is not a definitive benchmark by any means at all. Just a number to show you realtime may be possibly with the better cards.


Also check out the ati line of cards that are PCIe since they may have better support. The nvidia cards are tricky since not all the cards are PCIe native, some (like 6800 ultra) use a bridge. This means the card is basically an AGP card running on a PCIe bus. The 6600GT is a native PCIe card, so it should have faster readback then the bridged cards put out by nvidia. Some standard 6800s are native and some are not. I think all the 6xxx cards will eventually become native, its a matter of time. You dont want to take that chance, and would really be careful if you choose an Nvidia card.

I suggest some research, possibly creating a benhmark demo for ppl to test their configs so you can decide on the card to get.

[Edited by - qazlop on February 16, 2005 12:44:44 AM]

Share this post


Link to post
Share on other sites
NVidia improved their readback performance by a factor of 4/5 between the geforce 5 and geforce 6.

I've benchmarked a gf6800 GT AGP and got readback of ~1gb/sec compared to ~200mb/sec on a gf5900.

Until their current generation ATI have had lower readback performance than NVidia. I'm not sure what speed is acheivable on their latest cards.



Share this post


Link to post
Share on other sites
Thnx for the replys so far. I heard some numbers and got a faint idea what to expect. But i would like some more numbers so i will make a little demo with a massive buffer to readback.

Didn't find anything new about extensions, so i'll stick to glReadPixels. I also read about directDraw, but i have to develop in linux so that won't work.
I think i'd better make the benchmark for windows so more ppl will be able to run it. (and maybe a linux demo too for comparison)

I'll get back at it later today.

Share this post


Link to post
Share on other sites
This benchmark with source code might help you.

http://www.mars3d.com/Software/PixPerf.zip

Run it as follows:

pixperf -read -type ubyte -format bgra -size 128
pixperf -read -type ubyte -format bgra -size 128 -readpdr

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I'm not sure if it makes any difference but you might get better performance by rendering to texture instead of rendering to the framebuffer and then reading that back into a texture. They might be the same internally, but its worth checking into.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628336
    • Total Posts
      2982156
  • Similar Content

    • By DejayHextrix
      Hi, New here. 
      I need some help. My fiance and I like to play this mobile game online that goes by real time. Her and I are always working but when we have free time we like to play this game. We don't always got time throughout the day to Queue Buildings, troops, Upgrades....etc.... 
      I was told to look into DLL Injection and OpenGL/DirectX Hooking. Is this true? Is this what I need to learn? 
      How do I read the Android files, or modify the files, or get the in-game tags/variables for the game I want? 
      Any assistance on this would be most appreciated. I been everywhere and seems no one knows or is to lazy to help me out. It would be nice to have assistance for once. I don't know what I need to learn. 
      So links of topics I need to learn within the comment section would be SOOOOO.....Helpful. Anything to just get me started. 
      Thanks, 
      Dejay Hextrix 
    • By mellinoe
      Hi all,
      First time poster here, although I've been reading posts here for quite a while. This place has been invaluable for learning graphics programming -- thanks for a great resource!
      Right now, I'm working on a graphics abstraction layer for .NET which supports D3D11, Vulkan, and OpenGL at the moment. I have implemented most of my planned features already, and things are working well. Some remaining features that I am planning are Compute Shaders, and some flavor of read-write shader resources. At the moment, my shaders can just get simple read-only access to a uniform (or constant) buffer, a texture, or a sampler. Unfortunately, I'm having a tough time grasping the distinctions between all of the different kinds of read-write resources that are available. In D3D alone, there seem to be 5 or 6 different kinds of resources with similar but different characteristics. On top of that, I get the impression that some of them are more or less "obsoleted" by the newer kinds, and don't have much of a place in modern code. There seem to be a few pivots:
      The data source/destination (buffer or texture) Read-write or read-only Structured or unstructured (?) Ordered vs unordered (?) These are just my observations based on a lot of MSDN and OpenGL doc reading. For my library, I'm not interested in exposing every possibility to the user -- just trying to find a good "middle-ground" that can be represented cleanly across API's which is good enough for common scenarios.
      Can anyone give a sort of "overview" of the different options, and perhaps compare/contrast the concepts between Direct3D, OpenGL, and Vulkan? I'd also be very interested in hearing how other folks have abstracted these concepts in their libraries.
    • By aejt
      I recently started getting into graphics programming (2nd try, first try was many years ago) and I'm working on a 3d rendering engine which I hope to be able to make a 3D game with sooner or later. I have plenty of C++ experience, but not a lot when it comes to graphics, and while it's definitely going much better this time, I'm having trouble figuring out how assets are usually handled by engines.
      I'm not having trouble with handling the GPU resources, but more so with how the resources should be defined and used in the system (materials, models, etc).
      This is my plan now, I've implemented most of it except for the XML parts and factories and those are the ones I'm not sure of at all:
      I have these classes:
      For GPU resources:
      Geometry: holds and manages everything needed to render a geometry: VAO, VBO, EBO. Texture: holds and manages a texture which is loaded into the GPU. Shader: holds and manages a shader which is loaded into the GPU. For assets relying on GPU resources:
      Material: holds a shader resource, multiple texture resources, as well as uniform settings. Mesh: holds a geometry and a material. Model: holds multiple meshes, possibly in a tree structure to more easily support skinning later on? For handling GPU resources:
      ResourceCache<T>: T can be any resource loaded into the GPU. It owns these resources and only hands out handles to them on request (currently string identifiers are used when requesting handles, but all resources are stored in a vector and each handle only contains resource's index in that vector) Resource<T>: The handles given out from ResourceCache. The handles are reference counted and to get the underlying resource you simply deference like with pointers (*handle).  
      And my plan is to define everything into these XML documents to abstract away files:
      Resources.xml for ref-counted GPU resources (geometry, shaders, textures) Resources are assigned names/ids and resource files, and possibly some attributes (what vertex attributes does this geometry have? what vertex attributes does this shader expect? what uniforms does this shader use? and so on) Are reference counted using ResourceCache<T> Assets.xml for assets using the GPU resources (materials, meshes, models) Assets are not reference counted, but they hold handles to ref-counted resources. References the resources defined in Resources.xml by names/ids. The XMLs are loaded into some structure in memory which is then used for loading the resources/assets using factory classes:
      Factory classes for resources:
      For example, a texture factory could contain the texture definitions from the XML containing data about textures in the game, as well as a cache containing all loaded textures. This means it has mappings from each name/id to a file and when asked to load a texture with a name/id, it can look up its path and use a "BinaryLoader" to either load the file and create the resource directly, or asynchronously load the file's data into a queue which then can be read from later to create the resources synchronously in the GL context. These factories only return handles.
      Factory classes for assets:
      Much like for resources, these classes contain the definitions for the assets they can load. For example, with the definition the MaterialFactory will know which shader, textures and possibly uniform a certain material has, and with the help of TextureFactory and ShaderFactory, it can retrieve handles to the resources it needs (Shader + Textures), setup itself from XML data (uniform values), and return a created instance of requested material. These factories return actual instances, not handles (but the instances contain handles).
       
       
      Is this a good or commonly used approach? Is this going to bite me in the ass later on? Are there other more preferable approaches? Is this outside of the scope of a 3d renderer and should be on the engine side? I'd love to receive and kind of advice or suggestions!
      Thanks!
    • By nedondev
      I 'm learning how to create game by using opengl with c/c++ coding, so here is my fist game. In video description also have game contain in Dropbox. May be I will make it better in future.
      Thanks.
    • By Abecederia
      So I've recently started learning some GLSL and now I'm toying with a POM shader. I'm trying to optimize it and notice that it starts having issues at high texture sizes, especially with self-shadowing.
      Now I know POM is expensive either way, but would pulling the heightmap out of the normalmap alpha channel and in it's own 8bit texture make doing all those dozens of texture fetches more cheap? Or is everything in the cache aligned to 32bit anyway? I haven't implemented texture compression yet, I think that would help? But regardless, should there be a performance boost from decoupling the heightmap? I could also keep it in a lower resolution than the normalmap if that would improve performance.
      Any help is much appreciated, please keep in mind I'm somewhat of a newbie. Thanks!
  • Popular Now