SpriteBatch billboards in a 3D slow on mobile device

Started by
6 comments, last by OpaqueEncounter 9 years, 11 months ago

I used this method http://blogs.msdn.com/b/shawnhar/archive/2011/01/12/spritebatch-billboards-in-a-3d-world.aspx to create a 3D billboard renderer using SpriteBatch. It works perfectly as described and on a modest desktop (with Intel HD graphics) can renderer 10,000s of billboards or particles easily.

On a mobile device (Windows Phone) the framerate drops sharply past a certain, not so large, point. My test (on all devices) is this:

- Render a primitives (sphere, cube, etc) into a render target

- Pass the render target to the method above

- Start increasing the number of billboards until the framerate drops.

On an x86 desktop or an ARM tablet (Surface) the framerate holds into the thousands. On the phone, it instantly drops from 60 to 30 (looks like disabling VSync has no effect on that device) as soon as you pass a certain point (~200?). The funny thing is that I can get the framerate to go up to 60 again by making the billboards half the size. Same goes when making the render target half the size.

Using a stopwatch, I determined that the time spent on the CPU is nowhere near the 16.67ms threshold. VS2013's frame analysis is unavailable on the Windows Phone, so that's useless.

Can anyone explain as to what is going on here? Is this simply the limitation of a low-power GPU (the Adreno 225 in this case)? If so, what exactly is bogging it down? The fill rate? The blending? (I tried all blend states from Opaque to NonPremultiplied, no effect on performance).

Advertisement

I would think that most likely you are fill-rate bound. In the absence of a GPU profiler, the easiest way to confirm whether or not you are fill rate bound is by setting up a scissor rectangle so that only a small area of the screen is visible. For your particular simple case, maybe just make the particles smaller instead of add a scissor rectangle.

If it's not the fill rate, maybe it's the cost of the vertex processing.

It sounds very much like you're pixel bound if reducing the size improves the performance. Phone GPUs are horribly slow compared to even a basic PC GPU. Your options are:

1. Simplify the pixel shader. Ideally it'd be a single line of code doing a texture fetch for billboards.

2. Render at a reduced screen resolution, with MSAA on.

3. Render less pixels. For example use extra polys (e.g. octagons instead of quads) to render less transparent pixels. For circles this saves up to about 20%.

Check out http://aras-p.info/texts/files/FastMobileShaders_siggraph2011.pdf for some more info on how phone GPUs perform.

I would think that most likely you are fill-rate bound. In the absence of a GPU profiler, the easiest way to confirm whether or not you are fill rate bound is by setting up a scissor rectangle so that only a small area of the screen is visible. For your particular simple case, maybe just make the particles smaller instead of add a scissor rectangle.

If it's not the fill rate, maybe it's the cost of the vertex processing.

Not vertex processing for sure since the aforementioned method does all that on the CPU. That, I managed to measure to ensure that it's not a bottleneck. And yes, reducing what is being drawn on screen increases the framerate.

It sounds very much like you're pixel bound if reducing the size improves the performance. Phone GPUs are horribly slow compared to even a basic PC GPU. Your options are:

1. Simplify the pixel shader. Ideally it'd be a single line of code doing a texture fetch for billboards.

2. Render at a reduced screen resolution, with MSAA on.

3. Render less pixels. For example use extra polys (e.g. octagons instead of quads) to render less transparent pixels. For circles this saves up to about 20%.

Check out http://aras-p.info/texts/files/FastMobileShaders_siggraph2011.pdf for some more info on how phone GPUs perform.

The shader used in that method is BasicEffect, in which I disabled absolutely everything (even vertex color). I am already running at the lowest resolution feasible.

To render less pixels, I also tried to replace BasicEffect with AlphaTestEffect.

It seems that if this is a fillrate issue, the only thing really left is to skip drawing some of those billboards. Luckily, it happens to be that quite a few of them are blocked most of the time. I am not really sure where to start here if this is a solution. Frustum culling is not really the answer here and occlusion querying is unavailable on CPUs like Adreno 225 and less, which I plan on targeting.

Any suggestions?

Other than Adam's suggestions #2 and #3 there isn't really anywhere else to go other than trying to achieve the same effect with fewer, more opaque particles.

Unlikely, but is there scope for improving your texture at all? e.g. If it's a large 8888 non-mipmapped texture, then you would see gains from switching to a smaller mipmapped compressed texture.

Other than Adam's suggestions #2 and #3 there isn't really anywhere else to go other than trying to achieve the same effect with fewer, more opaque particles.

Unlikely, but is there scope for improving your texture at all? e.g. If it's a large 8888 non-mipmapped texture, then you would see gains from switching to a smaller mipmapped compressed texture.

I actually generate the texture like I described above (render models into a render target). I'll play around with lower quality pixel format, but I guess if there are no other suggestions then I'm stuck with it.

The only thing I don't understand is, why is it that reducing the render target size helps if this is a fillrate issue? Or is fillrate a bit more broad than I assume it to be? (Sampling a larger image contributes as well?)

The limitation could be memory bandwidth, which is also really low on mobile platforms.

You should also try generating mip maps for your texture, not having them can hurt performance significantly.

The limitation could be memory bandwidth, which is also really low on mobile platforms.

You should also try generating mip maps for your texture, not having them can hurt performance significantly.

EDIT: Well, I actually did try running GenerateMipMaps every frame and the framerate did go up, so that's that. :)

This topic is closed to new replies.

Advertisement