Problem updating UBO that is currently bound on Intel HD Graphics

Started by
8 comments, last by DragonJoker 6 years, 11 months ago
As the title says, it seems that updating contents of UBO while it is currently bound to a slot on Windows 10 + Intel HD Graphics doesn't work.

I can reproduce this issue on several machines with clean Windows 10 install, fully updated (including 1607 update) and graphics driver that OS installed by itself - each having either Intel HD Graphics card, either Intel HD Graphics 4600 or Intel HD Graphics 5300.

My rendering schedule is basically the following:
  • Activate the appropriate shader program.
  • Attach UBOs to the appropriate slots of shader program.
  • Activate VAO for model 1.
  • Update UBOs with the appropriate parameters (matrices, light parameters).
  • Draw call for model 1.
  • Activate VAO for model 2.
  • Update UBOs with new parameters.
  • Draw call for model 2.
  • Repeat steps #6-8 for models 3, 4, .., N -1, N (it is the same mesh, just using different data in UBO).
The above scheme appears to work just fine on a set of Nvidia and AMD graphics cards that I've tried, both on Windows and Linux; it also works on Intel HD Graphics cards under Linux. However, it doesn't seem to work on Intel HD Graphics under Windows 10 with the driver installed by OS. The contents of UBO doesn't seem to update in step #7 and keeps old data uploaded in step #4.

I have different code paths for "EXT_direct_state_access", "ARB_direct_state_access" and non-DSA approach, but the issue is exactly the same (on Intel HD Graphics cards with Windows 10, "ARB_direct_state_access" is actually not exposed, so I'm not using it there).

Basically, the non-DSA code that exhibits the issue on that configuration is:
// VAO is activated before this.

// (a) create UBO (note: this chunk of code is called at startup, it is not part of rendering loop)
glGenBuffers(1, &bufferHandle);
glGetIntegerv(bufferTargetToBinding(bufferTarget), reinterpret_cast<GLint*>(&previousBinding)); // simulate DSA way
glBindBuffer(bufferTarget, bufferHandle); // bufferTarget is GL_UNIFORM_BUFFER
glBufferStorage(bufferTarget, bufferSize, nullptr, GL_MAP_WRITE_BIT);
glBindBuffer(bufferTarget, previousBinding);

// (b) bind UBO
glBindBufferRange(bufferTarget, bufferChannel, bufferHandle, bufferOffset, bufferSize); // bufferOffset is 0

// (c) update UBO
glGetIntegerv(bufferTargetToBinding(bufferTarget), reinterpret_cast<GLint*>(&previousBinding)); // simulate DSA way
glBindBuffer(bufferTarget, bufferHandle);
mappedBits = glMapBufferRange(bufferTarget, mapOffset, mapSize, GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT); // mapOffset == 0, mapSize == bufferSize
std::memcpy(mappedBits, data, mapSize);
glUnmapBuffer(bufferTarget);
glBindBuffer(bufferTarget, previousBinding);

// (d) draw call
glDrawArrays(topology, baseVertex, vertexCount);

// (e) Repeat (c) and (d) for other models.
If instead of "glMapBufferRange", I use "glBufferSubData" to update the contents, the issue is less pronounced, but still exists (some models seem to jump back and forth between old and new positions as specified in UBO).
Note that the issue occurs only on Windows 10 / Intel HD Graphics cards, not anywhere else.

I have found two workarounds: one is to call "glFinish" right after "glDrawArrays", which seems to fix the problem; another workaround is to call "glBindBufferBase" to unbind UBO before updating its contents, then bind it again:
glBindBufferBase(bufferTarget, bufferChannel, 0); // unbind UBO
// Update UBO contents here as in code above, step (c)
glBindBufferRange(bufferTarget, bufferChannel, bufferHandle, bufferOffset, bufferSize); // bind buffer back
However, both of these workarounds seem to impact he performance. I couldn't find anything in the GL spec mentioning that buffer objects need to be unbound first or "glFinish" be called, before updating their contents.

So my question would be: is the issue I'm experiencing just a driver bug, or should buffer objects be really unbound before updating their contents?

P.S. I'm using a very similar code to update VBOs as well, and they also exhibit the same issue on Intel HD Graphics cards and Windows 10, albeit to a less degree simply because I don't update them often.
Advertisement

If even glBufferSubData has the issue and you are 100% sure your update data is ok then this is clearly a driver bug. Did you try to install a current intel driver by yourself already?

And have you contacted Intel dev support?

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

Thank you for the replies. I've tried installing updated driver from Intel on machine with Intel HD Graphics 4600, but the result was the same.
I'll try to contact Intel dev support with a simple example reproducing the issue.

One more thing, your "DSA"-like getter setter thing. First of all glGetIntegerv should not be used in performance code paths. Not sure how the intel drivers exactly behave but in case of nvidia and amd it would sync your application thread and the driver thread and that is very bad for performance.

You should exclusivly send information to OpenGL (except initialisation and debuging), and for data streams (e.g. reading out images or buffers to the client) only after fences get signaled.

Just store the current status yourself somewhere. Just try to not use anything that asks OpenGL for values.

I would actually expect that for trivial cases where the driver could cache the information locally, that it probably will, particularly for GL-specific weirdness that doesn't actually exist in hardware (such as bind-to-modify behaviour); but otherwise yes - there is nothing in the GL spec that makes this promise and it shouldn't be relied on.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Time for science again. Run some simple test with mass draw calls and changing buffer and texture IDs. And after every draw call read out that IDs via glGetIntegerv. Could only messure ~1% difference in performance one mesa and the AMD blob drivers on windows7.
I have separate code paths for ARB_direct_state_access (either under OpenGL 4.5 or when exposed as extension), EXT_direct_state_access and "non-DSA" way, which simulates DSA by preserving the state. However, I mostly focus on maintaining DSA versions as I haven't found any OpenGL 3.3 (which is minimal for my framework) hardware that doesn't support it. The issue I'm having on Intel driver occurs both with EXT_direct_state_access (exposed by Intel's drivers on Windows) and non-DSA approaches.

Regarding "glGetIntegerv performance impact" - I have macros to disable that alongside with "glGetError". However, in my own performance benchmarks that I've performed on 7 different AMD/Nvidia graphics card each, and Intel graphics card ranging from OpenGL 3.3 to 4.5 support on Windows and Linux, I've found negligible performance difference when massively updating buffers and issuing drawing calls, less than 1/10th of 1%. If you know a specific combination of a graphics card, OS and driver version where this does make any significant difference - I would definitely be interested in testing it.

I've read somewhere that in OpenGL ES, especially older devices with OpenGL ES 2, "glGet" calls may indeed hurt performance, but at least on desktop this doesn't seem to be a issue and even if it is, the design of my framework requires not to modify OpenGL state unless it is part of the method's purpose (e.g. activate shader, bind texture, etc.), which is why I'm focusing on DSA-way almost exclusively.

As I said, my expectation is that if a certain state can be reasonably cached locally by the driver, it will be cached locally by the driver and a glGet call for it will not incur a performance penalty. Less than 1/10 of 1% is statistically "in the noise", and transient background conditions on your PC can cause higher performance impact.

So: glGetInteger, glGetFloat, etc are in general safe to call.

Something like glGetTexImage or glGetBufferSubData, however, would require a round-trip to the GPU and should not be used in a performance-critical code path.

Unfortunately it's not possible to give an absolute ruling on this in OpenGL. Being able to recognise which state is likely a GL software construct (and is therefore most likely stored locally by the driver and would not require a round-trip to the GPU) is something that comes with experience of GL and of other APIs. Hence I would advise "don't rely on it" rather than an absolute "don't do it"; even if it works just fine on all current drivers, because OpenGL doesn't specify this behaviour you have no guarantee that it won't be different on future drivers.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Well, even if you are right about that negligible impact on the tested configurations, you still say "in general safe to call", and there may be exceptions.
Since it is easy (at least for most of glGet... calls) to get rid of them, you should get rid of them, so you don't encounter problems on those exceptions.

If you can't find anything, look for something else.

This topic is closed to new replies.

Advertisement