Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


We're also offering banner ads on our site from just $5! 1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


GL_MAP_PERSISTENT_BIT performance problem


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
10 replies to this topic

#1 Prune   Members   -  Reputation: 218

Like
0Likes
Like

Posted 20 January 2014 - 02:15 PM

I create a buffer object with GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT and then I map it with GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_FLUSH_EXPLICIT_BIT.

 

According to usage examples I've seen, I write to the pointer and do a flush. No unmapping. I was initially doing this for uniform buffer objects holding mainly transforms, so I'd be writing data multiple times per frame. I didn't notice any problems.

 

However, I tried to use this with pixel buffer objects and I have a big problem. In my setup, I have shared memory into which another application is drawing stuff, and I use it to load as texture in my application. To make the uploading asynchronous, I use the standard way of ping-ponging two PBOs. Initially, I wasn't using the persistent and flush bits. I would map one PBO and write data, then unmap it, and the other one would have an asynchronous texture subimage operation. Then the next frame they switch.  When I tried to change the map-write-unmap operation to map once with persistence and then write-flush, the framerate of everything dropped very significantly. How is it that doing this once per frame had so much impact, when it seems to work fine with UBOs as in my first use case? Is there an issue with persistent mapping and simply the amount of memory being transferred (HD-resolution texture during most frames)? I assume I'm probably doing something wrong with the way I'm using it, perhaps when I'm calling the flush operation (right now it's immediately after the write), but I really don't know. The feature is fairly new to OpenGL, so perhaps drivers aren't as well optimized for it, but that doens't seem likely (I'm on GTX 680). Any suggestions? I was hoping to actually get an improvement by saving on the map/unmap calls...


Edited by Prune, 20 January 2014 - 02:18 PM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

Sponsor:

#2 phantom   Moderators   -  Reputation: 7394

Like
4Likes
Like

Posted 21 January 2014 - 06:03 AM

It might not be the persistent bit which is the problem but the unsynchronized bit - Have a look at http://www.slideshare.net/CassEveritt/beyond-porting and see if the advice early on it that helps you out.

#3 Prune   Members   -  Reputation: 218

Like
0Likes
Like

Posted 27 January 2014 - 03:02 PM

I understand the part about GL_MAP_UNSYCHRONIZED_BIT causing the client and server threads to synchronize, but I don't understand exactly how GL_MAP_COHERENT_BIT​ works. Looking at http://www.opengl.org/wiki/GLAPI/glMapBufferRange, there's also a comment that seems to contradict the claim of the presentation: "Obviously, there's a reason why you don't get the coherent behavior by default. That reason being performance. You should try to live with the explicit synchronization mechanisms if it is at all possible." So which is it? And, in the context of this coherent flag, what would the effect of the flags (1) GL_MAP_INVALIDATE_BUFFER_BIT and (2) GL_MAP_FLUSH_EXPLICIT_BIT be? My best guess is that, without the unsynchronized bit, then I'd either use the coherent bit or explicit flush, but not both (but then what does the invalidate bit do in the former case)?


"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

#4 richardurich   Members   -  Reputation: 1187

Like
1Likes
Like

Posted 27 January 2014 - 06:22 PM

No contradiction. GL_MAP_COHERENT_BIT is bad performance, just better than the alternative. Notice in the presentation they used 3x the buffer size to help. Also, notice in the talk the "Not a good fit for:" section that lists where coherent is worse performance than the alternatives.

 

1. GL_MAP_INVALIDATE_BUFFER_BIT causes data to be undefined unless written after the invalidate. Performance would depend on the implementation, but I'd suspect no performance benefit if you're using coherent since you would have already taken the hit before you can invalidate the data.

2. GL_MAP_FLUSH_EXPLICIT_BIT causes any modifications not flushed to be undefined. It sounds like this would fully override coherent, although I might be missing something about coherent since I haven't ever used it yet.



#5 Prune   Members   -  Reputation: 218

Like
0Likes
Like

Posted 27 January 2014 - 06:28 PM

So what do you think is the best approach for 1) frequently modified uniforms, such as transforms, and 2) per-frame modified large amount of data, such as large textures?

 

[Edit:] I have additional confusion due to OpenGL Insights http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-AsynchronousBufferTransfers.pdf recommending the use of GL_MAP_UNSYNCHRONIZED bit with multiple buffers or multiple ranges, instead of orphaning or round-robin...


Edited by Prune, 27 January 2014 - 06:37 PM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

#6 richardurich   Members   -  Reputation: 1187

Like
0Likes
Like

Posted 27 January 2014 - 07:22 PM

Does OpenGL Insights even account for the existence of GL_MAP_COHERENT_BIT? I'd suspect uniforms that are modified every frame would be well-suited for coherent since you aren't making a separate call to sync/orphan. A lot of the benefits they explain from AMD pinned apply to GL_MAP_COHERENT_BIT, although there is sync without a separate call.

 

I wouldn't want to guess what is best performance on large amounts of data changed every frame since processing a huge chunk of data is obviously less likely to bottleneck on the OpenGL call.

 

If you're at the optimization stage, it's definitely worth investigating coherent's performance. Just be sure to wrap ClientWaitSync like the presentation suggested.



#7 Prune   Members   -  Reputation: 218

Like
0Likes
Like

Posted 05 February 2014 - 04:54 PM

Just to be clear, should the glFenceSync() be placed right after the last GPU command that uses the written data? And when I'm doing this explicit fencing, then I should not be also orphaning with ...INVALIDATE... because the orphaning might reallocating another chunk of memory for the buffer?

 

Also, if manually flushing, should glFlushMappedBufferRange() be called right after modifying the data, or just before using the data? It's not clear to me from the description which says that it "indicates" the data has been modified. Does this in essence start the DMA transfer, or wait for its completion on the GPU side?


Edited by Prune, 05 February 2014 - 05:40 PM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

#8 richardurich   Members   -  Reputation: 1187

Like
1Likes
Like

Posted 05 February 2014 - 08:32 PM

glFenceSync injects an sync object into the command buffer. That syncObj is basically set to a value of false (if it were a bool), and when every command preceding syncObj in the command buffer finishes then syncObj gets set to true. Somewhere later in your code, you have "glSyncWait(syncObj,0,GL_TIMEOUT_IGNORED);" and that instruction does nothing other than wait until syncObj is true. Orphaning is not used for this method since you're responsible to make sure memory isn't getting used for 2 different things at once.

 

glFlushMappedBufferRange() should be called when you want OpenGL to know there are changes it must pick up before executing instructions using the data. You want to use it as early as possible so you don't stall waiting to get the data, but not so early that you'd need to flush constantly. I guess use it after making a block of updates, but not after each individual update. In essence, it queues the DMA transfer and returns. It does not wait for the data transfer to complete.



#9 Prune   Members   -  Reputation: 218

Like
0Likes
Like

Posted 06 February 2014 - 01:20 PM

Do you mean glClientWaitSync() rather than glSyncWait() (actually, glWaitSync())? I do, after all, need to block the client thread before it does the memcpy.


"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

#10 richardurich   Members   -  Reputation: 1187

Like
1Likes
Like

Posted 06 February 2014 - 02:47 PM

Do you mean glClientWaitSync() rather than glSyncWait() (actually, glWaitSync())? I do, after all, need to block the client thread before it does the memcpy.

Yea, sorry about that. glSyncWait() isn't even a function. You definitely want to glClientWaitSync() when you need your client to wait. I tried looking up the functions online quick to give you more accurate code, but I think I just made it less accurate. Wrap this stuff with anything that will help you debug, use a timer instead of blocking forever, etc. Often you'll set a tight timer during development so you know for performance every time you're waiting and can adjust buffer size or whatever, then loop with a more generous timer waiting to sync.

 

If you read stuff from other people, be warned that I'm not the only one extremely sloppy with my notation on syncs. You might see glFence when someone means sync since it's how you generate the sync object. You might see "sync" meaning anything related to the entire process. You might even see a sequence of code with sync in the wrong place (before/after where it should be). It's because they're just reminding you that you need to do the sync. The whole thing feels the same as locks where people just throw the word "lock" around to say "and whatever I'm talking about will need locks done right, but I'm not doing that for you."

 

Best of luck, and don't be afraid to ask questions if you run into trouble. I won't pretend it's the easiest and most straightforward way to render things, and I'm probably not very good at explaining it either =)



#11 Prune   Members   -  Reputation: 218

Like
0Likes
Like

Posted 17 February 2014 - 11:16 PM

Just a note: looks like the fence is needed even if you use explicit flush. That's not what I expected...


Edited by Prune, 20 February 2014 - 01:08 PM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS