I just started making a game in OpenGL but everything OpenGL specific is in one tiny C module so that I can easily change to Vulkan when more graphics cards are supported on Linux.
I want a faster texture upload in order to allow drawing of many tiny sprites on the CPU with full control over the depth buffer and then add deferred normal mapping, global volume light, turbulence, bloom, fog, water and gamma correction on the GPU. The problem is that rendering a texture uploaded from the CPU is very slow. Probably from being stored in write often memory by the OpenGL drivers and read often memory would be slow to upload instead.
Only CPU rasterization without GPU upload takes 0.3 ms for hard clipping and 2.0 ms using alpha filtering. This is without multi-threading or SIMD optimizations.
Software resterization + upload + sampling write often memory on GPU takes 10.0 ms which barely makes the 15 ms deadline.
Only GPU rendering with fixed textures takes 4.0 ms which is okay for OpenGL but then I cannot write freely to the depth buffer unless there is an extension for that. Copying back from fake depth buffers all the time would stall the GPU while waiting for the output as the next input texture.
Is there a memory trick that I can use in OpenGL to avoid stalling on sampling an uploaded texture?
Right now I just upload the software rasterized result to an existing texture ID using glTexImage2D.
Before you point out the obvious, my game would probably be much faster with hand coded DSP assembly on a Snapdragon 820 SoC with unified memory architecture and a HVX capable mDSP but I don't even like playing mobile games and it would have to be signed as firmware by the hardware vendor to go beyond root access.