Fastest way to get all distinct colors in a texture

Started by
5 comments, last by deftware 9 years, 11 months ago
Hi,
I am baking occlusion culling data by rendering the scene from certain points into six textures (basically a cubemap). The scene is rendered with a special shader which takes the ID of each object and encodes that into an RGBA color which is then written to the texture. After the texture is rendered, I loop over each pixel in the texture and reconstruct the ID of the object from the pixel color. This way I get a list of all objects visible in the rendered image, without having to sort the objects based on distance to camera or anything, but the problem is that iterating over all pixels in the texture is fairly slow.
How would you speed up the process of getting a list of all distinct colors from the texture?
Cheers!
Advertisement

That's an interesting question. What do you have at your disposal? Perhaps compute shaders and atomics could do - they are so fast they can be used for OIT transparency. Building the acceleration structure seems to be perhaps the most important thing.

You could have something like a matrix transpose on Local Data Share... if the API using it supports it (I'm not sure DirectCompute does).

Assuming you use DirectCompute or some Profile 5 target, you could use a write barrier to gather information from the various work-items in a workgroup, reduce the various lists and then spill them out to a global buffer.

This global buffer would still not be unique but it would likely be much smaller.

Previously "Krohm"

No, if you are just given the texture then you are out of luck. There is no way to magically determine which colors are in a texture without examining at least N pixels (where you are looking for N colors corresponding to your object identifiers) and if the texture has no particular structure, i.e. is just a random frame with arbitrary geometry in it, then you are looking at a best case of N lookups (if all your colors are neatly tucked at the top left corner of the texture) and a worst case of checking every single pixel (if one color is at the bottom right of the texture, or if your colors are distributed randomly across the image, etc...)

So, you might be able to speed it up a bit with a clever parallel reduction method (assuming you don't have too many object ID's) but in general complexity will still be at least proportional to the number of pixels in the texture. But perhaps the approach can be changed to allow this to work efficiently. All you need to do is make it so your shader outputs the occlusion data in a way that the CPU can actually use efficiently, this could be as simple as a 1xN black and white texture which the pixels shaders write to whenever they get to render a particular object (one pixel for each object). There might be synchronization issues, though. I'm sure there is a lot of literature on fast occlusion query techniques. I think you are on the right track by rendering ID "colors" like this, but you need to output them differently because as you've seen scanning the texture to actually extract the occlusion info produced by the shaders out of it is way too expensive.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

I am working with Unity, so I don't have direct access to many interface. (Yes, I know that unity ships with Umbra, this is for procedural levels).

One thing I did, which provided a speedup of over 4x in certain cases, is simply to split the texture in four parts and hand off the parts to 4 threads. This is good enough for now since it brings down the bake time considerably.

I was originally thinking of writing a number to a lookup texture only when the pixel shader was fired, but that would have required me to sort the objects based on distance to camera. Also, more importantly, MRT support seems to not be very well supported in Unity.

Out of curiosity can you make do with just 6 renders per cell, I was never sure how that would cover all viewing positions/angles the player might do?

Basically, you want a histogram with two special conditions: 1) you don't care about the actual count, only need to know zero or not-zero and 2) you have a lot of bins.

Even without compute shaders, this is something that can conveniently run on the GPU. If you have shader storage buffer objects, use these, and nothing extra is needed. Just render a quad the same size of the texture (possibly with an empty FBO, too) and for each fragment read a texel, mash together the four 8-bit values to get the ID, and set the respective index in the shader storage object. You could do that with a simple loop too (rendering only one fragment total) but then of course you don't use the GPU's capablity of doing the task massively parallel and hiding memory/texture latencies by switching threads in and out.

Otherwise, if no shader storage is available, you must work around the fact that you only have a gather, but no scatter operation. The GPU Gems 3 article on histograms comes to mind, or you could just abuse the vertex shader for most of the heavy lifting.

As a quick-sketch idea, render width*height vertices (point mode) without inputs and let it read one texel per vertex from the texture according to gl_VertexID, then output a vertex with coordinates that correspond to that texel's value in the output texture (trivial fragment shader, writing out constant 1.0). This has quite some overdraw, but you could use the vertex ID as depth, so early Z will kill most of the overdraw.

Then, to find out wheter an object ID is present anywhere in those 6 textures, you simply need to look up its index in the output texture, that should be very fast.

Reading back the result probably takes longer than it takes the GPU to perform the calculation!

As for the huge number of bins, you probably don't have 4 billion objects (likely not even 16.7 million), therefore much of the 32 bits in a pixel will be unused. Support for 81922 textures is pretty much omnipresent, and looking at what GL_MAX_VIEWPORT_DIMS tells me, it's not trouble using that as rendertarget either. Now, 40962 can already encode 16,7 million object IDs (and 81922 is around 67 million), so either way it should be no issue.

Out of curiosity can you make do with just 6 renders per cell, I was never sure how that would cover all viewing positions/angles the player might do?

The game is tile based, so I have to iterate over all the "walkable" tiles and render the cubemap to see which other tiles I need to render when the player is standing on the current tile.

In the case of Rage and the feedback system that tells the engine what portions of the megatexture need to be streamed, they render a low-res color coded image, where the color signifies what coordinates of the virtual texture to load. I thought your current problem seemed related and could perhaps benefit from attempting to use smaller textures to minimize the time it takes to detect your colors on there ?

This topic is closed to new replies.

Advertisement