Are two OpenGL contexts still necessary for concurrent copy and render?

Started by
9 comments, last by Prune 9 years, 11 months ago

Looking at http://on-demand.gputechconf.com/gtc/2012/presentations/S0356-Optimized-Texture-Transfers.pdf

Are the two contexts required? Will rendering not occur while the DMA transfer is proceeding, unless I do the upload in another thread, even with last-gen NVIDIA cards? If so, how does that make sense? It seems an artificial limitation, as the hardware obviously can handle it (even in the single copy engine consumer-level cards) if you have another thread.

(If it matters, I'm using persistently mapped PBOs.)

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)
Advertisement

I only took a short look but it seemed to be multi-threaded? In that case yes the two contexts are required...yeah it wouldn't be required in DirectX. Sometimes makes me want to learn DirectX.

Before you run off and actually use a second context, we just had this discussion: http://www.gamedev.net/topic/656684-cases-for-multithreading-opengl-code/


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Looking at http://on-demand.gputechconf.com/gtc/2012/presentations/S0356-Optimized-Texture-Transfers.pdf

Are the two contexts required? Will rendering not occur while the DMA transfer is proceeding, unless I do the upload in another thread, even with last-gen NVIDIA cards? If so, how does that make sense? It seems an artificial limitation, as the hardware obviously can handle it (even in the single copy engine consumer-level cards) if you have another thread.

(If it matters, I'm using persistently mapped PBOs.)

What's wrong with the answer I've already given to you?

Or you want to hear another formulation of the same answer?

"No, it is not necessary, but that is how NV drivers are designed so far and there is no other way to turn on copy engine".

Using two contexts is unlikely to help you since it's a driver issue. Or rather, a deliberate driver "feature".

In theory, transfers are CPU/GPU asynchronous (and render-asynchronous), in practice they are only client/server asynchronous.

I can't quote a source right now and might be wrong on the exact hardware generation (though I believe it was in Cozzi and Riccio's book?). Basically, the thing is that pre-Kepler (or was it Fermi? I think it was Kepler) hardware has one dedicated DMA unit that runs inependently of the stream processors, so it can do one DMA operation while it is rendering, without you doing anything special. However, only Quadro drivers actually use this feature, consumer-level drivers stall rendering while doing DMA. Kepler and later have 2 DMA units and could do DMA uploads and downloads in parallel while rendering, but again, only Quadro drivers use the hardware's full potential.

AMD has as far as I remember (not 100% sure on that) a similar issue, but it's not because a driver but an actual hardware thing.

I can't quote a source right now and might be wrong on the exact hardware generation (though I believe it was in Cozzi and Riccio's book?). Basically, the thing is that pre-Kepler (or was it Fermi? I think it was Kepler) hardware has one dedicated DMA unit that runs inependently of the stream processors, so it can do one DMA operation while it is rendering, without you doing anything special. However, only Quadro drivers actually use this feature, consumer-level drivers stall rendering while doing DMA. Kepler and later have 2 DMA units and could do DMA uploads and downloads in parallel while rendering, but again, only Quadro drivers use the hardware's full potential.

This drives me crazy because I've already answered on Prune's question in another forum. dry.png

Pre-Fermi GPUs do not allow overlapping rendering and data downloading/uploading. Fermi was the first NV architecture where it is enabled. High-end Quadro cards have two separate DMA channels that can overlap, while GeForce cards have (or at least is enabled) just one. It is not clear whether two channels can transfer data in the same direction simultaneously (I guess not, but it is quite reasonable) This is known as (Dual) Copy Engine. Kepler has the same capability as Fermi considering the way how copy engine is working. Activating Copy Engine is not free, so by default it is turned off. NV drivers use heuristics (there is no special command to turn it on) to activate copy engine, and that is a separate context doing only data tranfer. That's why the second context is necessary.

Please correct me if I'm wrong.

This is probably my last post post about (Dual) Copy Engine since I'm really tired of repeating the same thing.

What's wrong with the answer I've already given to you?

Where?

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

What's wrong with the answer I've already given to you?

Or you want to hear another formulation of the same answer?

This drives me crazy because I've already answered on Prune's question in another forum. dry.png
...

This is probably my last post post about (Dual) Copy Engine since I'm really tired of repeating the same thing.

How about the fact that the two threads, one here and one on the other forum, were made within a few minutes of each other, before any answer was posted? You owe me an apology. Of course, given that you haven't yet responded to my "Where?" question, indicates mens rea -- you realized this already. One can only surmise that your ego is simply too large for you to admit that you were wrong.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

OK! My apologies!

But you are wrong about my ego. As you can see it is not so large. smile.png

I didn't noticed the time in the post in this forum. I'm regularly checking OpenGL forums, and this thread I noticed after several of my answers on another, which make me think you are not satisfied with them. That was my mistake.

Btw, you didn't knowledge my posts, which also make me think you are not satisfied and searching for other opinions. That's why I overreacted. Mea culpa! (I have also attended gymnasium and had Latin, so I hope we could understand each other perfectly. wink.png )

If I reacted strongly, it was because you didn't reply the first time I called you out on it tongue.png Not because you made a mistake; hell, I make more mistakes than most. But, moving on... vita brevis.

Any implied dissatisfaction isn't with your giving an answer, but with the state of affairs (i.e., inability to trigger a copy engine by simply starting a large enough upload in the same thread/context). I wasn't blaming the messenger. I didn't respond as I was waiting to see if someone else might know an alternate workaround before I'd go to the bother of testing with two contexts and posting back in the thread.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

This topic is closed to new replies.

Advertisement