Jump to content

  • Log In with Google      Sign In   
  • Create Account

Are two OpenGL contexts still necessary for concurrent copy and render?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
11 replies to this topic

#1 Prune   Members   -  Reputation: 223

Like
0Likes
Like

Posted 13 May 2014 - 04:55 PM

Looking at http://on-demand.gputechconf.com/gtc/2012/presentations/S0356-Optimized-Texture-Transfers.pdf

Are the two contexts required? Will rendering not occur while the DMA transfer is proceeding, unless I do the upload in another thread, even with last-gen NVIDIA cards? If so, how does that make sense? It seems an artificial limitation, as the hardware obviously can handle it (even in the single copy engine consumer-level cards) if you have another thread.

 

(If it matters, I'm using persistently mapped PBOs.)


Edited by Prune, 13 May 2014 - 05:15 PM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

Sponsor:

#2 lask1   Members   -  Reputation: 800

Like
0Likes
Like

Posted 16 May 2014 - 06:14 PM

I only took a short look but it seemed to be multi-threaded? In that case yes the two contexts are required...yeah it wouldn't be required in DirectX. Sometimes makes me want to learn DirectX.



#3 L. Spiro   Crossbones+   -  Reputation: 14236

Like
0Likes
Like

Posted 16 May 2014 - 09:41 PM

Before you run off and actually use a second context, we just had this discussion: http://www.gamedev.net/topic/656684-cases-for-multithreading-opengl-code/


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#4 Aks9   Members   -  Reputation: 913

Like
0Likes
Like

Posted 17 May 2014 - 02:32 AM

Looking at http://on-demand.gputechconf.com/gtc/2012/presentations/S0356-Optimized-Texture-Transfers.pdf

Are the two contexts required? Will rendering not occur while the DMA transfer is proceeding, unless I do the upload in another thread, even with last-gen NVIDIA cards? If so, how does that make sense? It seems an artificial limitation, as the hardware obviously can handle it (even in the single copy engine consumer-level cards) if you have another thread.

 

(If it matters, I'm using persistently mapped PBOs.)

What's wrong with the answer I've already given to you?

Or you want to hear another formulation of the same answer?

"No, it is not necessary, but that is how NV drivers are designed so far and there is no other way to turn on copy engine".



#5 samoth   Crossbones+   -  Reputation: 5032

Like
0Likes
Like

Posted 17 May 2014 - 05:08 AM

Using two contexts is unlikely to help you since it's a driver issue. Or rather, a deliberate driver "feature".

 

In theory, transfers are CPU/GPU asynchronous (and render-asynchronous), in practice they are only client/server asynchronous.

 

I can't quote a source right now and might be wrong on the exact hardware generation (though I believe it was in Cozzi and Riccio's book?). Basically, the thing is that pre-Kepler (or was it Fermi? I think it was Kepler) hardware has one dedicated DMA unit that runs inependently of the stream processors, so it can do one DMA operation while it is rendering, without you doing anything special. However, only Quadro drivers actually use this feature, consumer-level drivers stall rendering while doing DMA. Kepler and later have 2 DMA units and could do DMA uploads and downloads in parallel while rendering, but again, only Quadro drivers use the hardware's full potential.

 

AMD has as far as I remember (not 100% sure on that) a similar issue, but it's not because a driver but an actual hardware thing.



#6 Aks9   Members   -  Reputation: 913

Like
0Likes
Like

Posted 17 May 2014 - 06:01 AM

I can't quote a source right now and might be wrong on the exact hardware generation (though I believe it was in Cozzi and Riccio's book?). Basically, the thing is that pre-Kepler (or was it Fermi? I think it was Kepler) hardware has one dedicated DMA unit that runs inependently of the stream processors, so it can do one DMA operation while it is rendering, without you doing anything special. However, only Quadro drivers actually use this feature, consumer-level drivers stall rendering while doing DMA. Kepler and later have 2 DMA units and could do DMA uploads and downloads in parallel while rendering, but again, only Quadro drivers use the hardware's full potential.

This drives me crazy because I've already answered on Prune's question in another forum. dry.png

Pre-Fermi GPUs do not allow overlapping rendering and data downloading/uploading. Fermi was the first NV architecture where it is enabled. High-end Quadro cards have two separate DMA channels that can overlap, while GeForce cards have (or at least is enabled) just one. It is not clear whether two channels can transfer data in the same direction simultaneously (I guess not, but it is quite reasonable) This is known as (Dual) Copy Engine. Kepler has the same capability as Fermi considering the way how copy engine is working. Activating Copy Engine is not free, so by default it is turned off. NV drivers use heuristics (there is no special command to turn it on) to activate copy engine, and that is a separate context doing only data tranfer. That's why the second context is necessary.

 

Please correct me if I'm wrong.

 

This is probably my last post post about (Dual) Copy Engine since I'm really tired of repeating the same thing. 


Edited by Aks9, 17 May 2014 - 06:02 AM.


#7 Prune   Members   -  Reputation: 223

Like
0Likes
Like

Posted 19 May 2014 - 12:22 PM

 

What's wrong with the answer I've already given to you?

Where?


Edited by Prune, 19 May 2014 - 12:32 PM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

#8 Prune   Members   -  Reputation: 223

Like
0Likes
Like

Posted 20 May 2014 - 06:40 PM

 

What's wrong with the answer I've already given to you?

Or you want to hear another formulation of the same answer?

 

This drives me crazy because I've already answered on Prune's question in another forum. dry.png
...

This is probably my last post post about (Dual) Copy Engine since I'm really tired of repeating the same thing. 

 

How about the fact that the two threads, one here and one on the other forum, were made within a few minutes of each other, before any answer was posted? You owe me an apology. Of course, given that you haven't yet responded to my "Where?" question, indicates mens rea -- you realized this already. One can only surmise that your ego is simply too large for you to admit that you were wrong.


Edited by Prune, 20 May 2014 - 06:42 PM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

#9 Aks9   Members   -  Reputation: 913

Like
2Likes
Like

Posted 21 May 2014 - 03:32 AM

 

OK! My apologies!

But you are wrong about my ego. As you can see it is not so large. smile.png

I didn't noticed the time in the post in this forum. I'm regularly checking OpenGL forums, and this thread I noticed after several of my answers on another, which make me think you are not satisfied with them. That was my mistake.

Btw, you didn't knowledge my posts, which also make me think you are not satisfied and searching for other opinions. That's why I overreacted. Mea culpa! (I have also attended gymnasium and had Latin, so I hope we could understand each other perfectly. wink.png  )



#10 Prune   Members   -  Reputation: 223

Like
0Likes
Like

Posted 21 May 2014 - 11:16 AM

If I reacted strongly, it was because you didn't reply the first time I called you out on it tongue.png Not because you made a mistake; hell, I make more mistakes than most. But, moving on... vita brevis.

 

Any implied dissatisfaction isn't with your giving an answer, but with the state of affairs (i.e., inability to trigger a copy engine by simply starting a large enough upload in the same thread/context). I wasn't blaming the messenger. I didn't respond as I was waiting to see if someone else might know an alternate workaround before I'd go to the bother of testing with two contexts and posting back in the thread.


Edited by Prune, 21 May 2014 - 11:17 AM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

#11 Prune   Members   -  Reputation: 223

Like
0Likes
Like

Posted 21 May 2014 - 11:49 AM

What's still not clear to me is whether that presentation and the copy engine stuff applies specifically, in the upload case, to direct upload to texture with glTexSubimage(). Is it the same with all other transfers? For example, the way I normally stream is to buffer objects that are persistently mapped with GL_MAP_PERSISTENT_BIT. If uploading to a buffer, then, is behavior a) same as with the upload to texture case--that is, the transfer is only concurrent with rendering when initiated in another context, b) works along side rendering even if in the same thread (until the appropriate barrier), as seems to be implied in http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead, or c) always serialized on the GPU side?

 

For example, in the case of transform and material data, I have those in buffer objects, per shader program, and then render with glMultiDrawElementsIndirectCountARB() once per shader. The buffer objects are treated as triple-sized ring buffers. My rendering thread has been: glClientWaitSync on buffer ranges to be updated, write updated data into mapped buffers, maybe do some other rendering stuff not involving updated data, glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT), draw calls using the buffer objects, and glFenceSync and advance the triple buffer indexes. I had assumed that the DMA transfer would proceed from after the "write updated data" in that sequence in parallel with any other GL stuff, up until the memory barrier. Is this incorrect? Does it mean that, just like with glTexSubimage(), I'd have to instead move the buffer upload that to another thread/context in order to actually be a DMA transfer that doesn't prevent the GPU from rendering in the meantime? Or is it even that it's never in parallel because only glTexSubimage() triggers the copy engine?


Edited by Prune, 21 May 2014 - 11:53 AM.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

#12 Prune   Members   -  Reputation: 223

Like
0Likes
Like

Posted 03 June 2014 - 03:33 PM

Any comment on the last one? Does this only apply go glTexSubimage(), or are other buffer transfers, especially mapped buffers, handled the same way, and require a second context to occur concurrent with rendering, or only texture upload can trigger the copy engine?


"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS