I'm having trouble getting Occlusion Queries to work properly in XNA. I can't tell if it's because of the sheer number of potential queries, or if I'm doing something incorrectly. A typical scene in my current tests includes around 15k cubes.
With occlusion queries off, and using only frustum culling, I can get 60fps (limited by v-sync) when looking away from the scene, and about 12 fps when viewing all cubes at once.
When I start using occlusion queries in addition to frustum culling, my max fps is about 10fps no matter what is on the screen currently, and I have the additional problem of cubes disappearing permanently once they are occluded. If I sort the models by distance, the fps drops to about 5.
I have been failing at fixing this problem for over a week now, though I have managed to improve it. Previously, I was at a max FPS of 2, which makes me cry inside. My hardware is 2.8GHz i7, 16GB ram, radeon 6700 series 1GB vram.
Are only the occlusion test models cubes or are the actual models cubes as well?
Speed wise, occlusion queries only really make sense if the cost of doing the query is a lot less than the cost of just drawing the thing(s) it represents in the first place. For example, doing an occlusion query where you draw a simple bounding box (very cheap) in order to check if you need to draw an entire city block (very expensive if the geometry is complicated) is probably a good trade-off. Doing an occlusion query where you draw a single cube in order to avoid drawing another single cube, all of the above repeated up to 15,000 times, is probably not going to be effective. Even if you have more complicated models the sheer overhead of executing thousands of occlusion queries could make the speed suffer.
If you are trying for speed when looking at the entire scene, I would first say to look into an instancing strategy to achieve better batching. 15,000 draw calls is going to be inherently slow even if you are drawing simple things. Getting occlusion queries right is tricky. If you still want to do the occlusion queries, it would probably be more worthwhile to do 1 occlusion query for a group of many nearby models, by drawing the group's bounding box -- that way even if the cost of drawing a model is the same as the cost of a query you can potentially spend 1 draw to save 100 draws.
Some other miscellaneous thoughts:
Is there more to your drawing code? It looks like you draw your models inside the occlusion query Begin/End, which doesn't make much sense to me.
You don't want to be making occlusion queries unless you know the object in question is within your view frustum, otherwise the query is doomed from the start.
I'm confused why you are drawing a wireframe for your occlusion test object. You should probably be drawing a solid bounding volume or else you are going to get some cases where the occlusion query reports zero pixels when there really might be some. I can see why you would draw the final objects as wireframes though, just to see if the occlusion tests are actually working.
I was playing around with hardware instancing before occlusion, but ran into an issue with depth between differing textures. For instance, if I had an instance draw for grass, then for dirt, even if the dirt was occluded by grass, because I called the instance draw second it would draw on top of the grass.
So I moved on to occlusion culling. The idea was to use the bounding box (supposedly simpler than the texture cube) to check whether occlusion was still necessary every few frames, but only after assuming at first that the object is visible, then the actual occlusion would have a latency of 1 frame. I modeled my code after NVidia's example from GPU Gems 3.
I have also considered doing ray-casting from camera to each block, but hoped occlusion queries would be faster because it is native to XNA. Also, I didn't really know where to start with ray-casting.
I am interested in your suggestion for batching multiple cubes in one draw call, but I have found that hard to do using a model. Would it be wiser to just use primitives since I'm only using simple cubes?
Indeed it does mean I have 15k cubes. It pretty much varies from 10k to 20k with Simplex noise, usually I'm seeing closer to 15k-16k blocks in a 32x32x32 chunk. I am limiting the draw calls to what is visible in the viewing frustum, but with my current method, I'm far from ever achieving the complexity I would like to have, seeing as a single chunk makes the game nearly unplayable.
As an aside, I did move my query begin and end inside the frustum culling, and changed it from checking bounding box to the actual cube and definitely got better results on the accuracy of occlusion. And the framerate is now back up to 60fps when not looking at the chunk, but it's down to 4fps when looking at the entire chunk.
Any advice on instance/batching, or possibly meshing? And do you think ray-casting would be worth a shot?
Forget about occlusion queries in that case. You are limited by draw calls so most likely by the cpu. Adding occlusion queries actually increases the amount of draw calls so is even counterproductive. If you use minecraftesque chunks don't render every single cube. Only render the surface. For that you only insert quads on solid-air interfaces into a vertex buffer. You then get one vertex buffer per chunk and can render it in a single draw call.