Does glMapBuffer() Allocate Client-Side Memory?

Started by
4 comments, last by Vincent_M 9 years, 4 months ago

I was learning about glMapBuffer(), and it appears that it returns a pointer upon success. At first, I thought: "Great! Direct access to GPU memory, and I no longer need to allocate a separate pool of memory client-side only to copy it into the GPU with glBufferSubData(). I can literally even load raw data from files using glMapBuffer()." Then, I saw that I needed to call glUnmapBuffer() once I was finished --albeit to commit my changes. I got the suspicion that glMapBuffer() is really copying whatever data to a client-side pool, then copying the modified data back, and destroying the temporary pool once glUnmapBuffer()'s called. Then, I read glMapBuffer()'s description on OpenGL.org's website:

glMapBuffer maps to the client's address space the entire data store of the buffer object currently bound to target

Interesting wordage there... glMapBuffer() "maps" to the client's address space? At first, I thought glMapBuffer() actually returned a pointer to the GPU's memory, but now it sounds like glMapBuffer()'s doing behind-the-scenes client-side copying, depending on the driver. Is my suspicion correct?

I thought operating systems typically provide ALL memory, regardless of where in the system it's located, its own unique range of memory addresses. For example, memory address 0x00000000 to 0x2000000 point to main system memory while 0x20000001 to 0x2800000 all point to the GPU's memory. These memory ranges are dictated by the amount of recognized system memory, and GPU memory (including virtual memory stored in page files). Of course, firmware in simpler devices, such as older consoles, would have a fixed, possibly documented, memory map, since consoles' hardware can almost never be upgraded.

Advertisement

Okay, let's break this down.


At first, I thought: "Great! Direct access to GPU memory

Wrong.


I got the suspicion that glMapBuffer() is really copying whatever data to a client-side pool, then copying the modified data back, and destroying the temporary pool once glUnmapBuffer()'s called.

Close.


At first, I thought glMapBuffer() actually returned a pointer to the GPU's memory, but now it sounds like glMapBuffer()'s doing behind-the-scenes client-side copying, depending on the driver. Is my suspicion correct?

Mostly.

So here's the deal: MapBuffer returns a pointer which you can write to. What this pointer actually refers to is the driver's discretion, but it's going to be client memory as a practical matter. (The platforms that can return direct memory won't do it through GL.) This may be memory that the GPU can access via DMA, which means that the GPU can initiate and manage the copy operation without the CPU/driver's participation. The driver also doesn't necessarily need to allocate this memory fresh, as it can keep reusing the same block of memory over and over as long as you remember to Unmap.


I thought operating systems typically provide ALL memory, regardless of where in the system it's located, its own unique range of memory addresses. For example, memory address 0x00000000 to 0x2000000 point to main system memory while 0x20000001 to 0x2800000 all point to the GPU's memory. These memory ranges are dictated by the amount of recognized system memory, and GPU memory (including virtual memory stored in page files).

Not so much. Windows Kernel 6.x (Vista) gained the ability to map GPU memory into the virtual address space of a particular process, but that's more about internal management of multitasking with the GPU than having much to do with application code. It's not going to live in the same physical memory address space used for main system memory, though, and you can't read/write to it arbitrarily.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

I thought operating systems typically provide ALL memory, regardless of where in the system it's located, its own unique range of memory addresses. For example, memory address 0x00000000 to 0x2000000 point to main system memory while 0x20000001 to 0x2800000 all point to the GPU's memory.

Regarding PHYSICAL RAM, maybe... But we work with VIRTUAL RAM at all times these days.
If your process needs to access some physical RAM, the OS has to give you a range of virtual addresses, and then 'map' those virtual addresses to the physical resources you've allocated.
By default, there will be no virtual addresses corresponding to any VRAM. Also, a quirk of modern desktop OS's means only a small bit of VRAM can be mapped to CPU-side virtual addresses at one time (hence all the unmapping).

In practice, if you're using the no-overwrite/unsynchronized map flags/hints, you've got the best chance at being given an actual pointer to VRAM! If so, this means that when writing to those addresses, you'll skip the CPU's caches and go via a write-combining buffer for maximum throughput (another reason for the mandatory unmap - in this case, the driver needs to flush the CPU's write-combine cache), but if you read from that pointer, well, it's going to be dog slow (no cache, non-local resource = bad).

With any other map flags (except perhaps in write-discard/orphaning situations), the driver will almost certainly internally allocate some extra CPU-side RAM, and copy through to the GPU itself.

I don't think any modern OS allow directly physical access to memory..thats just asking for trouble.Like others have mentioned memory access is virtualized.

glMapBuffer maps to the client's address space the entire data store of the buffer object currently bound to target



Interesting wordage there... glMapBuffer() "maps" to the client's address space? At first, I thought glMapBuffer() actually returned a pointer to the GPU's memory, but now it sounds like glMapBuffer()'s doing behind-the-scenes client-side copying, depending on the driver. Is my suspicion correct?

What the parragraph "maps to the client's address space" says is that, the memory can be accessed by the client in its own address space.
This could be either because through virtual memory the address actually translates directly to GPU memory (yay!), or as you say, the driver allocated some memory CPU side and will copy it later to the GPU (ouch).

The wording is carefully crafted (cryptic!) like that because OpenGL has a "server-client" architecture. Back in the 80's, a system would issue commands via OpenGL, and the commands would be carried over a network to a rendering workstation/farm.
Obviously, you can't map server memory directly from the client when the data travels through an ethernet cable; so the driver would allocate its own memory and send the commands later.
But when dealing with modern systems, the wording also allows virtual memory to directly map to GPU memory, which is what you want.

What actually happens heavily relies on the GL implementation; it's like Promit says.
Your best bets are with glMapBufferRange, instead of glMapBuffer. GL_MAP_UNSYNCHRONIZED_BIT increases the chances of getting a pointer directly mapped to GPU memory. But you then have to synchronize all access yourself (see ARB_sync and apitest).

Using GL_MAP_PERSISTENT_BIT increases the chances a lot more, since that's the whole point of persistent mapping.

But, as Promit said, it's not a guarantee.

glMapBufferRange() sounds like it'd be a hassle haha... Is glMapBuffer() generally faster than glBufferSubData(), or is it mainly there for convenience? It doesn't seem like it would cut down on any overhead.

glMapBufferRange() sounds like it'd be a hassle haha... Is glMapBuffer() generally faster than glBufferSubData(), or is it mainly there for convenience? It doesn't seem like it would cut down on any overhead.

Not generally, no. glMapBuffer is indeed slower on at least one major IHV (source: Cass Everitt's slides on persistent mappings). In any case, mapping is the only viable way to scaling the workload of filling buffers to several CPU cores (shared contexts are a theoretical alternative, but they're actually slower than skipping the threading stuff alltogether).

Also note that glMapBufferRange with the unsynchronized bit (see post by Matias Goldberg) is much faster than simple glMapBuffer as it avoids at least one sync.

This topic is closed to new replies.

Advertisement