Teaching advice: glBufferData or glMapBuffer

Started by
10 comments, last by cgrant 8 years, 10 months ago

Hi everybody, I am teaching an introductory class on OpenGL and to keep things simple the curriculum introduces geometry specification with client side arrays. Later in the course we touch on VAOs and VBOs filled with glBufferData.

I'm thinking of modernizing the material by (obviously) totally scrapping client side arrays and (not-so-obviously) also scrapping glBufferData in favor of glMapBuffer or glMapBufferRange.

It is a short class and I think conceptually the students can hang with any of these options, but I wonder... if it is better to teach glBufferData and let them figure out glMapBuffer for themselves or teach glMapBuffer and let them figure out glBufferData for themselves.

Thanks,
Nick

Advertisement

::glBufferData() is still modern and compatible with OpenGL ES 2.0, which would be useful to your students who want to develop on their mobile devices.
As an introductory-level course it should be fine enough to stick with ::glBufferData().


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Keep in mind that glBufferData and glMapBuffer[Range] don't do the same thing. glBufferData actually allocates the memory for the buffer, as well as setting the contents whereas glMapBuffer(Range) only sets the data (once the buffer memory has been allocated). glBufferData is good enough for introductory material but I would also just mention glMapBuffer[Range] for the curious students who may want to know more.

Ok, thanks a lot! I'll stick with glBufferData() as my buffer-filling function in the lecture.

P.S. For my own personal curiosity, is the "proper" way to fill ever-changing vertex data with glMapBuffer() still require a glBufferData(NULL) beforehand to orphan? Is all this rigamarole avoided by using glMapBufferRange() with the invalidate flag instead?

I doubt my students will get that detailed, but I would like to know the real-world usage patterns in addition to the dry API descriptions.

glBufferData is used to actually allocate the buffer(rather you fill it with null data is upto you.)

glMapBuffer[range] is similar to glBufferSubData in that it allows you to update the contents of the data in the buffer. However neither glMapBuffer or glBufferSubData work if you haven't called glBufferData to actually allocate the buffer.

You should be teaching both, as both are very important to using video buffers.

Lastly the invalidate bit used in the map operations simply tells openGL that the contents of the existing buffer do not need to be maintained after finishing writing. This allows the gpu to not have to stall if its still using the existing buffer for drawing operations.
Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.


P.S. For my own personal curiosity, is the "proper" way to fill ever-changing vertex data with glMapBuffer() still require a glBufferData(NULL) beforehand to orphan? Is all this rigamarole avoided by using glMapBufferRange() with the invalidate flag instead?

I doubt my students will get that detailed, but I would like to know the real-world usage patterns in addition to the dry API descriptions.

You should read this book chapter carefully: http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf

Excellent book IMO, and lucky for you the sample chapter is on this topic.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

Wow, thanks for all the help everybody. I do understand that glBufferData is needed to allocating the buffer object... I am asking which function I should teach for filling it up. I guess I'm asking if glBufferData is deprecated as a filling-up function nowadays.

Promit, thanks so much for that link. I actually did stumble across that in my research and would like to ask a question about it a passage -- I hope I read it carefully enough:

Pinned memory is standard CPU memory and there is no actual transfer to de- vice memory: in this case, the device will use data directly from this memory location. The PCI-e bus can access data faster than the device is able to render it, so there is no performance penalty for doing this, but the driver can change that at any time and transfer the data to device memory.

So when I've taught (and been taught) VBOs in the past, they are often rationalized with a statement along the lines of "we used to keep vertex data in slow client-side memory, but with VBOs we can keep vertex data in superfast video card memory!"

I just did some double-checking before I posted this, even opengl.org frames the advantages of VBOs in terms of which memory they are stored in:

A Vertex Buffer Object (VBO) is a memory buffer in the high speed memory of your video card designed to hold information about vertices.

However, if I'm reading Hrabcak and Masserann correctly they are saying that actually, VBOs can be in client-side memory and additionally keeping vertex data in client-side memory isn't necessarily that slow.

This is very different than what I thought! I can only imagine that the performance improvement of a VBO stored in client-side memory comes because you aren't memcpy-ing your data over to the driver every frame. Is this true? Is presenting VBOs as superior for their lack of clientside-to-clientside copying every frame the way to go pedagogically?

Thanks again,

Nick

Ok so it's been a while since I've reviewed the hardware side of this but from what I remember-

The GPU has a DMA engine on board, that allows it to access client memory without the CPU's intervention. However this memory has to be mapped very particularly for this to be possible, and there are hard limits on how much memory can be used this way. In the AGP days it was something like 64 MB, but I don't know what it is now. I know that this number is definitely less than a 32 bit address space (4 GB) and that space is also shared amongst all devices that might like to initiate DMA transfers. There's also a question of how much space the driver is willing to set aside. Also keep in mind that the actual memory allocation is fluid and constantly adapted by the driver. Any uploads to device memory (BufferData, TexImageND, etc) would certainly require allocations in that space, which may have performance side effects.

Apart from that, I find their assertion that the bus is fast enough for it to make no difference to be suspect. By the numbers, there's tons of PCIe bandwidth, no doubt. But I don't know what kinds of internal overheads and latencies exist in fetching this data. Client memory doesn't have the bandwidth of GPU memory either. I would want this information before attempting to use client memory. This is something you'd have to go to the GPU hardware or driver people to get a clear answer on.

All in all, I would stick to the simple device/client memory model for most practical purposes. The rest should be treated as internal driver and hardware optimization, not part of the model for how things work.

Coming back to your original question - if you want to allocate a buffer in the first place, you either have to use BufferData or BufferStorage. Storage would be considered the more sophisticated way to handle it, but it's not available everywhere and it's a considerably more rigid function in usage. So I would take it as a given that you HAVE to teach them BufferData. Don't forget also that BufferSubData is hiding in there and it is NOT the same thing (indeed it's more your 'filling up function' than the regular BufferData). MapBufferRange is again a rather sophisticated function, and it isn't immediately obvious what it's capable of. It's up to you whether you want to teach the regular MapBuffer on top of BufferData, which is really going to be down to your course pacing. MapBuffer isn't actually mandatory for anything, and isn't even in ES 2.0. (It's an extension on iOS devices.)

Strictly looking at performance concerns and ease of use be damned: buffer storage, persistent mapping+memcpy, manual double/triple buffering, and fences will blow the doors off all other transfer methods. This is the expected approach in the new APIs.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

The choice of what method to fill the buffer is dependent on a per use case basis. If the buffer is going to be static etc and never changed for the lifetime of the application, then I don't really see a benefit in mapping the buffer and copying the data, might as well supply the data at buffer creation time. If the buffer contents changes on a regular basis, then you would have to see if mapping and copying is more efficient than sub uploading coupled with orphaning etc...there really is no right or wrong answer as only you would know your specific use case.

Whew okay... think I have my answer here. I was mostly worried about misleading the students by omitting the "standard fast" way, but it seems there is still a lot of flexibility in vertex upload functions.

One last technical question though (I can't help it) -- I didn't realize that glMapBuffer existed on iOS. Since iOS is a shared memory GPU, doesn't that mean that our client-side memory and our GPU memory are the same memory? In that case, wouldn't their be no difference between writing to unified memory with glBufferData/glBufferSubData versus writing to unified memory with glMapBuffer/glMapBufferRange?

Thanks for all the patience and answers,

Nick

This topic is closed to new replies.

Advertisement