Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.

Matias Goldberg

Member Since 02 Jul 2006
Offline Last Active Yesterday, 03:47 PM

#5194676 Vertex buffer object issue

Posted by Matias Goldberg on Yesterday, 03:47 PM

If on GL3 or newer, You should create a vao (create only once!) and bind it before the glVertexAttribPointer and glBindBuffer calls.

GLuint vaoName;
glGenVertexArrays( 1, &vaoName );
glBindVertexArray( vaoName );

#5193363 Mutiple VAOs and VBOs

Posted by Matias Goldberg on 17 November 2014 - 08:52 PM

Over a few frames things settle down and the driver is no longer allocating new blocks of memory but is instead just handing back blocks of memory that had previously been used.  So in other words it's not necessary to do your own multi-buffering, because the driver itself is automatically multi-buffering for you behind-the-scenes.
At this stage it's worth highlighting that this buffer update model is well-known and widely-used in D3D-land (where it's called "discard/no-overwrite") and has existed since D3D8, if not earlier; i.e it has close on 15 years of real-world usage behind it.  So it's not some kind of voodoo magic that you may not be able to rely on; it's a well-known and widely-understood usage pattern that driver writers anticipate and optimize around.

Both DX12 and GL4 are moving away from this pattern and moving towards an explicit low level access memory management. With fences, unsynchronized access, and persistent mapping.

Drivers may optimize for the discard/map-no-overwrite pattern, but the higher level app. has much more information than the driver on how to deal with memory and access hazards. Driver optimizations can only go so far.
But with great power, comes great responsability.

#5193360 Mutiple VAOs and VBOs

Posted by Matias Goldberg on 17 November 2014 - 08:44 PM

I'm sort of having trouble understanding how mixing these could be bad. Not saying that is what you mean, but it is implied that way to me. I would think this would act as an extra safety net to the whole unsyncronized methodology.

It's not that it's bad (btw mixing both doesn't act as safety net). The thing is that you needlessly generate a problem for you when you need to render with VAOs.

The VAO saves the VBO that is bound.
If you use three VBOs, either you modify the state of the VAO every frame, or have three VAOs (one per VBO). With just one VBO but using different ranges, you need one VAO, and no need to modify the VAO in any frame.

I basically see using glMapBufferRange + the unsyncronized flag as 'Put this data in the VBO right now, I do not care how or what you do just put it in there'.

That's correct.


Which could lead to things not drawing right if you accidentally map to a VBO that is being used in drawing.

That's the best thing that can happen. The worst thing that can happen is full system crash (BSOD, or not even that, DOS-style lockup needing a hard reset). Depends on GPU architecture and Motherboard (Bus).
You must synchronize.


Note that for dynamic content (and assuming you will be discarding the whole contents every frame), you just need one fence per frame. You don't need one fence per buffer.

If I use Round Robin with 3 VBOs or more and they all get mapped with glMapBufferRange + the unsyncronized flag, I would think that the only way it would fail is if my GPU is falling behind really really badly or something is seriously wrong.

If you don't use round robin, it's the same thing. Because the example I gave you, you would be writing to a region that the GPU is not writing right now.
Remember, it's not that the VBO is in use by the GPU while you're writing from the CPU. What's important is that the dword (4 bytes) you're writing to from the CPU is not currently being read by the GPU (to avoid a crash); and that the region of memory you're writing to has already been read from all the GPU commands until now (to avoid a race condition causing graphic corruption).

Is mixing those two methods just over kill?

Yes, because you gain nothing from mixing them, and complicate yourself by having more VBOs and more VAOs.

#5192371 Function definition for interest rate.

Posted by Matias Goldberg on 11 November 2014 - 11:32 PM

This belongs to Mathematical finance.

There are two types of interest rates, simple interest and compund interest. The example you gave is an example of compund interest. This is because after a year, the interests became part of the capital, and start generating interests on their own. Back in middle ages this was forbidden because the Church considered it usury, which is a sin (TBH, I don't think it's so far off...).


Simple interest's formula is of the form f(i) = C * (1 + i) * N

Where C is the original capital, i the interest rate, and N is time (could be in days, months, years; if you change N from i.e. years to months, you will need to adjust the i by dividing it by 12).


Compund interest's formula is of the form f(i) = C * (1 + i)^N

This answers your question. If you change N's unit of measurement (i.e. years to months), adjusting i is a bit more tricky. You actually need to do i' = (1 + i)^(1/12) - 1


Knowing that compound interest is the result of interest getting capitalized is very important. You may notice that while N is smaller than 1, simple interest is bigger than compound's. In real life (and this depends on legislation: whether interests are considered becoming part of the capital day by day, or only after the whole year has passed), often compound interest is just a piece-wise linear interest function.

During the first 12 months, the capital may grow at C * (1 + i) * N where N is in months, the second year it grows at C2 * (1 + i) * N; where C2 = C * (1+i) * 12 (In other words, C2 is the money you got after 1 year had passed)

This can be expressed by f(i) = C * (1 + i)^N; but only as long as N is a natural number (and not a Real number; unless the legislation considers the capitalization to happen day by day).


Well, that's enough for this post. Compound interest in real life situations can get very complex; and is in fact the subject of study of an entire quadrimester. You've got enough keywords now. Google is your friend.

#5191840 Mutiple VAOs and VBOs

Posted by Matias Goldberg on 08 November 2014 - 03:13 PM

However, no one seems to be interested in doing anything about it.

Post it to Timothy Lottes/Graham Sellers/Christophe Riccio's Twitter. See what they say...

They are aware of it.

#5191810 OpenGL - layout(shared) ?

Posted by Matias Goldberg on 08 November 2014 - 12:03 PM

When using the shared layout, you have to query OpenGL for the offsets of each element.

For example there could be hidden padding for alignment reasons. This padding depends on each GL driver. It could even change between driver versions.


std140 layout avoids the need to query the offsets, as there are rules on how the memory is layed out.

But it has very conservative memory layout rules so that it works on every possible hardware, and these rules are insane. Most importantly they're so hard to follow many driver implementations get it wrong (i.e. a vec3 always gets upgraded to a vec4, and four aligned consecutive floats should be contiguous, but I've seen drivers wrongly promote each float to a vec4).


I prefer std140 because it avoids querying and the offsets are known beforehand, while my GLSL declarations are all of type vec4 or mat4 and use #define macros if the name of a variable is too important for readability. Example:

layout(std140) uniform ubContext 
	mat4 	projectionViewMatrix;	
	mat4	projectionMatrix;
	mat4	viewMatrix;
        vec4    times; //times.z and times.w are not used (padding)
} Context;

#define currentTime Context.times.x
#define cosTime Context.times.y

This way you don't need to query, the offsets are known beforehand, and you're safe from most driver implementation problems.




#5191754 Mutiple VAOs and VBOs

Posted by Matias Goldberg on 07 November 2014 - 10:24 PM

However, no one seems to be interested in doing anything about it. It's very frustrating that something that should be faster (multiple VAOs), by its very design, is instead significantly slower.

I think probably the reason this isn't getting attention is because by following Azdo practices, switching VAOs should be very rare, enough to not make a difference.

It's still not an excuse though. There's a lot of "legacy" (i.e. non-Azdo code) that would benefit from a driver that actually takes advantage of VAOs instead of being super slow like you've found out.

#5191752 Mutiple VAOs and VBOs

Posted by Matias Goldberg on 07 November 2014 - 10:16 PM


Assuming that is the case I want to be doing this right?

glBindBuffer(GL_ARRAY_BUFFER, vbo[index]);
GLvoid * data = glMapBufferRange(/*Desired params. Write Bit and Unsynchronized Bit */)
/*add VBO data. repeat for all the dynamic data I need to draw per frame */
/*----- Later on, after I have all the data I want ----*/
/*Render setup. Bind a shader program, uniforms, and etc */
//All the VAOs look the same / have the same attribute pointers
if(index > maxVBOCount)
   index = 0;

Also what do you guys think about when it comes to writing to buffer n-1 and using glDrawElements on buffer n? Assuming the same approach described above is used


This would prevent GPU stalls right?


You're mixing the round robbin method (use two VBOs) with unsynchronized.


Normally when you use round robbin, it's because you don't have unsynchronized methods.


When you've got unsynchronized, you use one VBO, but allocate twice (or three times) the size you need to use. And when mapping, you lock a subregion each frame. For example you need to update 32MB, then create a 96MB buffer. On frame 0 you lock region [0; 32) on frame 1 lock [32; 64) then on frame 2 lock [64; 96), on frame 3 lock again region [0; 32) and so on. It's still a double buffer scheme, but with one buffer (well, in my example a triple buffer scheme, which is often what's recommended).


VERY IMPORTANT: Your code is broken, because you're using unsynchronized flags without fencing. That means you may write to a buffer while the GPU is still using it, thus glitches or crashes can happen (or even full system hang/bsod).

//Wait for the fence to complete
if( fences[i] )
    GLbitfield waitFlags    = 0;
    GLuint64 waitDuration   = 0;
    while( true )
        GLenum waitRet = glClientWaitSync( fences[i],
                                           waitFlags, waitDuration );
        if( waitRet == GL_ALREADY_SIGNALED || waitRet == GL_CONDITION_SATISFIED )
            glDeleteSync( fences[i];
            fences[i] = 0;

        if( waitRet == GL_WAIT_FAILED )
            //Fatal error! (Out of memory? Driver error? GPU was removed?)

        const GLuint64 kOneSecondInNanoSeconds = 1000000000
        // After the first time, need to start flushing, and wait for a looong time.
        waitFlags = GL_SYNC_FLUSH_COMMANDS_BIT;
        waitDuration = kOneSecondInNanoSeconds;

glBindBuffer(GL_ARRAY_BUFFER, vbo);
GLvoid * data = glMapBufferRange( start = region + index * size, /*Desired params. Write Bit and Unsynchronized Bit */)

// .... 

//The fence needs to be created after you're done with commands that read the VBO.
//Noob mistake: Don't create the fence right you've unmapped it (but you still
//will make another command to read from it, like a draw call)
fences[index] = glFenceSync( GL_SYNC_GPU_COMMANDS_COMPLETE, 0 );

index = (index + 1) % 3; //Triple buffer scheme

The sample apitest has sample code showing how to do this (note most of it is thought with GL4 in mind)

#5191330 Mutiple VAOs and VBOs

Posted by Matias Goldberg on 05 November 2014 - 08:59 AM


glMapBufferRange has been available since OpenGL 3.0 which is enough to pull this off. If you specify the GL_MAP_UNSYNCHRONIZED_BIT flag then the driver doesn't need to wait for that region of the buffer to be made available and it's almost the same as having a persistently mapped buffer (AFAIK).
Not quite. GL_MAP_UNSYNCHRONIZED_BIT avoids a CPU-GPU sync, but it still requires a server/client sync. Server and client roughly mean as much as the application's and the driver's thread (or threads).


See page 22 here, or pages 6 to 14 here for a good writeup.


This is true whatever mapping method you use except for persistent mapping. You'll have this problem whether you use AZDO or not.


Draw call overhead in AZDO has 3 key elements (I'll talk about only draw calls, excluding textures, setting shader constants, etc):

  1. Batching all meshes into one same VBO so that glDraw* calls can happen without needing to call glBindBuffer( GL_ARRAY_BUFFER ) or glBindVertexArray again.
  2. Unsynchronized buffer access so that you can access different regions of the same VBO without causing a full stall. This has benefits for both static and dynamic buffers.
  3. Persistent mapping so that calls to the expensive glMap* can be avoided.

Points 1 & 2 are available to both GL3 (via GL_MAP_UNSYNCHRONIZED_BIT) & GL4 hardware and point 3 is exclusive to GL4 hardware.

When it comes to filling static buffer (i.e. level loading), only points 1 & 2 are needed. You may even hurt performance by trying to use persistent mapping.

When it comes to filling dynamic data (i.e. dynamic vertex buffers, constant buffers, texture buffers), you can use points 1 & 2 in GL3 hw, and points 1, 2 & 3 in GL4 hardware.


With some clever design, you can support both hardware platforms. For filling dynamic data with GL3, call glMapBufferRange with GL_MAP_UNSYNCHRONIZED_BIT once per frame (at the beginning of the frame; you may need 1 to 4 of these calls per frame, call them all together), fill your pointers, then unmap it; and execute your draw calls afterwards.

For GL4, do exactly the same except calling glMapBufferRange and unmapping it.


If you were to code only for GL4 hardware, the advantage is that you can fill the buffer pointers and immediately after issue a glDraw* call (because in GL3, you can't call while the buffer is still mapped). But with some clever design, you can defer the draw calls until the end after you're done filling the buffers, and you end up with an engine compatible with both GL3 & GL4.


Filling dynamic data in pre-GL4 hardware is always going to need glMapBufferRange, so taking advantage of GL_MAP_UNSYNCHRONIZED_BIT is still a huge win.

#5191252 Mutiple VAOs and VBOs

Posted by Matias Goldberg on 04 November 2014 - 08:30 PM

There is contradictory information regarding multiple VAOs vs one single VAO and rebind.

Now, there is a 3rd option: AZDO. Azdo is about using one huge single VBO (or very few of them) and manual manipulation using unsynchronized mapping and fences (it's very Mantle-like behavior). Then place all the meshes in the same VBO at different regions (offsets) of the memory. When you've exhausted the VBO pool, create another VBO.


Because there are very few VBOs, now VAOs become more a matter of a "vertex format layout" specification, so you would only need one VAO per vertex format (still accounting there can be two VAOs for the same vertex format if you end up needing a few more vbos).


If you sort your draw calls by vertex format, the draw call overhead approaches zero, as you barely will need to switch vaos or vbos (or respecify any attribute).

#5191250 Problems with windows KB2670838 update

Posted by Matias Goldberg on 04 November 2014 - 08:18 PM

I'm trying to patch a dx10 game so it will work regardless of KB2670838, but I hit dead end. Nothing helps, even moving to newest Windows SDK.
Is thery any solution to this, other than moving to dx11?

If you're into hacking, someone has hacked PIX to work with the new updates already; the idea should be about doing the same for fixing those games you mentioned.


If you're not into hacking and looking for a solution, I'm afraid you're in the wrong forum. This site is for people who develop their own games, we can't do much for released games for which we don't have legal access to the source code.

#5188531 Request for Advice - Needing two OpenGL windows to display to...

Posted by Matias Goldberg on 22 October 2014 - 09:33 AM

You can have one context, two windows.
Just switch the hDC (1st argument) with the wglMakeCurrent/glxMakeCurrent call but use the same context (2nd argument).

It works **much** better than having two contexts.
The tricky part though, is VSync. You probably want to call swapbuffers once, not twice (i.e. one per window). This tends to be driver specific (i.e. do it wrong and the framerate will be 30fps, do it right, and framerate will be 60fps). You'll have to do some experimentation.

For this method to work, both windows must have the exact same pixel format and antialising options (can have different resolutions though), otherwise the wglMakeCurrent call will fail when you try the second hDC.

#5188425 Is there a way for me to see my game`s performance or frame rate?

Posted by Matias Goldberg on 21 October 2014 - 05:58 PM

Just curious, why implement it when there's already great software that does that? RenderDoc for GPU, Visual studio for CPU, I can't see how rolling your own can be better in any scenario with such advanced tools available?

These tools, while great, have some level of inaccuracy. i.e. the CPU profilers use "sampling based profiling" which is basically statistical collection of where the program spends most of the time.
Statistics are averages and have standard error. Furthermore these tools don't work outside the dev. environment (i.e. the tool is not available to the user i.e. VTune costs money, the PDBs need to tag along, etc). Not to mention these tools may have trouble hooking up to your app if you do something weird (DRM, some problematic device driver, program is running behind a virtual machine, etc).

It also doesn't tell you how long it takes a specific component unless it's statistically relevant, which is important when you're trying to build a frame budget.

Another reason is that not all platforms can use these tools; and while it's great to have them on PC, it's not so great when you have to deal with other devices where these profilers either don't exist or have poor support.

#5188330 Is there a way for me to see my game`s performance or frame rate?

Posted by Matias Goldberg on 21 October 2014 - 10:28 AM

There are two ways to measure framerate:, each one with its avantages, disadvantages and different levels of accuracy:

1. Implement it yourself. Measure the time taken between each frame, using a high resolution timer like QueryPerformanceCounter or rstdc. You can measure the frame rate by comparing the timestamp between the last time the function was called and the current measure, then save the current value for comparing it in the next frame, or you can use the profiler pattern where a given function you surround it by calls like beginProfile() endProfile() so you can know how long it takes a specific function or module

2. Use third party profilers like CodeAnalyst, VTune, PerfStudio, nSight, Intel GPA, pix and visual studio graphics debbuger. Each of them have different compatibility when hooked up with your application, may have vendor specific features (I.e. CodeAnalyst can work on non amd machines but many profiling methods will be unavailable) or may diagnose the wrong place as a hotspot (rare)

Ultimately a good dev will use a combination of all of the above and not just one tool.

#5187855 screen resolution, 1360x768 no longer possible?

Posted by Matias Goldberg on 18 October 2014 - 11:54 AM

Looks like you've got a VGA LCD/LED monitor.


These monitors require calibrating itself to the signal. The monitor will remember a few calibrations for a given combination of resolutions and frequency hz. Probably your monitor saw that the frequency changed (i.e. 60 vs 75hz, or 59.9 vs 60hz) when you switched GPUs, and thus got the calibration wrong.


There should be an "auto" button in your monitor that runs the calibration procedure, which usually takes 1 to 5 seconds. You'll see the screen stretching itself until it aligns properly.

Run the "auto" procedure while you've got plenty of colour in the screen. If you run it while there's mostly black, the calibration will go wrong (i.e. 10% of the screen will be "outside")


If there's no "auto" button, consult the manual of you monitor. It could be an OSD option or you have to hold a particular key combination.