No, I don't think I am inadvertently reading from the mapped write buffer, although that is a really good tip.....I'll have to remember that for the future.
I've been thinking about my implementation and might have a few trouble maker candidates.
The first is that at the end of any given render frame, I map a buffer and allow sub threads to write to it for (up to) the duration of the frame. Then at the end of the next render frame if writing is signaled as complete, I unmap it, and immediately call my glTexSubImage2D to ask ogl to start transferring its contents to texture(s). I wonder if I should be deferring this and allowing some time (ie 1 frame) between my unmap and my call to glTexSubImage2D? I had assumed OGL would handly this nicely internally on its own but now i'm not feeling so sure of how the texture access is handled. This leads me to my next question...
Has anyone tried creating multiple sets of destination textures and copying to/rendering from different sets each frame? I wonder if I could see an improvement in performance if I mirror my 'back buffer' pbo's with 'back buffer textures'? For example:
- init texture_A through PBO_A
- Render texture_A
- Map PBO_B, copy data into it, unmap it
- Init transfer to texture_B
- Render texture_B
- Map PBO_A, copy data into it, unmap it
- Init transfer to texture_A
- Repeat Frame 1