Sign in to follow this  
Kalith

GUI rendering in software using threads

Recommended Posts

Hi,

I've developed (yet another) GUI library that I use for my own game projects. Most of the time, the GUI I make are quite complex : they uses many different textures and render targets.
Up until now, I've always relied on the render system of the graphics library I use to draw the game world (Ogre3D, SFML, ...) to draw my GUI on the screen. So the rendering is done in hardware.
Doing it in real time is a real performance hit, that I've minimized using cache render targets, that are only updated when their content is modified.

Yet I feel it's not enough. Plus, using screen wide render targets for caching has its own cost (both in memory and render time, for large resolutions).

So I had another idea. Seeing how powerful CPUs can get nowadays, and seeing how badly they are used in game programming, I was wondering if I could use that spare computing power to release the strain on the GPU and render the whole GUI in software, maybe in another thread.

Several other considerations tend to make this choice a credible alternative :
[list][*]most of the time, GUI rendering consists of 1 to 1 texture drawing, which can be implemented as a simple copy (or blending) of pixel[*]proper 2D alpha blending is not feasible in one pass using todays hardware ([url="http://www.ogre3d.org/forums/viewtopic.php?f=5&t=44825&p=322042"]afaik[/url])[*]current GPUs are optimized for 3D rendering, and little effort is made to improve 2D performances[*]GUIs are most of the time made of plenty of small textures, which tends to create a large number of batches (even though this problem can partially be solved with texture packing algorithms, this system has its limitations)[*]hardware render targets can be tricky (poor support on old GPUs or with bad drivers, see Intel integrated chips)[/list]
I'm currently running some tests with my own software renderer (using SDL for texture loading and window management) and performances seem quite acceptable.
My test case, which consists of 9 sprites using 9 different 32x32 textures, with alpha blending, rendered into a 256x256 render target, which is then rendered into a 1024x768 render target takes approximately 1ms. Transferring the result to the screen then takes 4ms, but that depends highly on the underlying render system (here SDL), and is here a quasi constant cost (proportional to the final render area).
There is still room for optimization : I could use SSE instructions to blend colors for example.

My questions are :
Has anybody considered this alternative ?
Does it seem viable to you ?
What is the fastest, portable, software 2D renderer out there ?
Do I have a chance to do something better if I develop my own renderer, suited especially for my GUI system ?

Share this post


Link to post
Share on other sites
[quote name='Kalith' timestamp='1307400722' post='4820284']
most of the time, GUI rendering consists of 1 to 1 texture drawing, which can be implemented as a simple copy (or blending) of pixel
[/quote]
I for example use shaders for my GUI.

[quote name='Kalith' timestamp='1307400722' post='4820284']
proper 2D alpha blending is not feasible in one pass using todays hardware ([url="http://www.ogre3d.org/forums/viewtopic.php?f=5&t=44825&p=322042"]afaik[/url])
[/quote]
It is not, why ?

[quote name='Kalith' timestamp='1307400722' post='4820284']
current GPUs are optimized for 3D rendering, and little effort is made to improve 2D performances
[/quote]
That is the reason you render your GUI as 3D "objects".

[quote name='Kalith' timestamp='1307400722' post='4820284']
GUIs are most of the time made of plenty of small textures, which tends to create a large number of batches (even though this problem can partially be solved with texture packing algorithms, this system has its limitations)
[/quote]
You use one or two atlases to get rid of the trouble. The use of texture arrays could even help to create one batch if you really need very large texture atlases.

[quote name='Kalith' timestamp='1307400722' post='4820284']
hardware render targets can be tricky (poor support on old GPUs or with bad drivers, see Intel integrated chips)
[/quote]
You don't need render targets for GUI rendering, I use immediate OpenGL for rendering my GUI, if I would remove the shaders it could work on OpenGL1.1 compatible hardware.

[quote name='Kalith' timestamp='1307400722' post='4820284']
My questions are :
Has anybody considered this alternative ?
Does it seem viable to you ?
What is the fastest, portable, software 2D renderer out there ?
Do I have a chance to do something better if I develop my own renderer, suited especially for my GUI system ?
[/quote]
I doubt that any serious GUI handling would be fast enough. Remeber that you compare the power of one CPU core vs hundreds of GPU "cores" doing some simple copy work. An other issue is the transfer from CPU memory to video memory, thought framebuffers etc. could speed up the process.

For comparision, I render 200-400 icons (48x48) + few hundreds of chars + few backgroud images (800x600) in immediate(!) OpenGL each frame in less than 5 ms.

Share this post


Link to post
Share on other sites
some games do that, if you see what ppl do on consoles to save some ms (like doing a lot of transformation and culling on SPU to avoid doing that on the RSX, using the playstation edge library), then saving some ms by using a simple software gui rendering is a lot bang for a buck.

another nice thing bout software rendering is, that you can go oldschool and just update the regions that really change (it was common back then), as you anyway render to an offscreen buffer. doing so, you can have very complex gui rendering magnitude faster on cpu than using a gpu.




Share this post


Link to post
Share on other sites
As always, it depends on whether you're GPU-bound or CPU-bound on your target platform.

Lots of games use Flash for GUIs these days, which it turns out is horribly slow at drawing via the CPU, so GPU-accelerated flash renderers are worth a fair bit of cash to your average game studio ;)

Also, lots of games have HTML renderers in them these days -- many of which just use WebKit/Gecko/etc, which [b]are[/b] CPU rendered. Though these usually aren't used for highly interactive or commonly visible items, like HUDs (probably because they [i]are[/i] slow).

GPUs [b]are very good[/b] at 2D copying though; even if you do perform it via triangles, they're highly optimized for the task and have specialized hardware to help. They're so good in fact, that you might not even need to bother with any cache targets...

High batch counts largely affects CPU performance, and unless something's going quite wrong, will of course take less time than the actual work done by the batch (e.g. the batch submission should be cheaper than the texture blitting in any case).

As above, I don't understand the "proper 2D alpha blending" complaint.
As for old hardware, you'd have to be aiming at ~pre 2006 hardware (which is pretty ancient by today's standards) in order to come across any kind of "poor support".

Share this post


Link to post
Share on other sites
[quote name='Ashaman73' timestamp='1307425548' post='4820407']
I for example use shaders for my GUI.

[quote name='Kalith' timestamp='1307400722' post='4820284']
proper 2D alpha blending is not feasible in one pass using todays hardware ([url="http://www.ogre3d.org/forums/viewtopic.php?f=5&t=44825&p=322042"]afaik[/url])
[/quote]
It is not, why ?[/quote]
You don't need "shaders" in software rendering, because you already have full control of the pipeline.
As for the alpha blending, maybe I'm mistaken, but with all graphics engine I've used (Ogre3D and HGE for example), you couldn't get "true" (as in Photoshop) alpha blending to work out of the box [b]with render targets[/b] (I should have been more precise). In Ogre, as stated in the [url="http://www.ogre3d.org/forums/viewtopic.php?f=5&t=44825&p=322042"]post I link[/url], I had to use a two pass system. It could also be done in one pass with shaders I think, but I wonder how good this would perform...

[quote name='Ashaman73' timestamp='1307425548' post='4820407']
That is the reason you render your GUI as 3D "objects".
[/quote]
Yes and that's my point. It could probably be much faster if things were optimized for 2D rendering.

[quote name='Ashaman73' timestamp='1307425548' post='4820407']
You use one or two atlases to get rid of the trouble. The use of texture arrays could even help to create one batch if you really need very large texture atlases.
[/quote]
Atlases are the concepts I was referring to with my "texture packing algorithm". And as I say, they have their drawbacks : you cannot use texture tiling for example, except if you copy the same "sprite" multiple times, which sends N times more vertices to the GPU.

[quote name='Ashaman73' timestamp='1307425548' post='4820407']
You don't need render targets for GUI rendering, I use immediate OpenGL for rendering my GUI, if I would remove the shaders it could work on OpenGL1.1 compatible hardware.
[/quote]
I do. Using render targets to cache your GUI can give very interesting performances, from 2 to 3 times faster with complex GUIs.
Of course, for a basic HUD with 2 textures and 3 lines of text, that's nonsense.

[quote name='Ashaman73' timestamp='1307425548' post='4820407']
I doubt that any serious GUI handling would be fast enough. Remeber that you compare the power of one CPU core vs hundreds of GPU "cores" doing some simple copy work. An other issue is the transfer from CPU memory to video memory, thought framebuffers etc. could speed up the process.

For comparision, I render 200-400 icons (48x48) + few hundreds of chars + few backgroud images (800x600) in immediate(!) OpenGL each frame in less than 5 ms.
[/quote]
I know very little of how a GPU actually works, so I'll assume you're right on this.
Yet you cannot compare your timing performance with mine without saying what your GPU is. 5ms is sure a good timing though, and I doubt I'll be able to beat it with my old CPU.

[quote name='Hodgman']Lots of games use Flash for GUIs these days, which it turns out is horribly slow at drawing via the CPU, so GPU-accelerated flash renderers are worth a fair bit of cash to your average game studio ;)

Also, lots of games have HTML renderers in them these days -- many of which just use WebKit/Gecko/etc, which [b]are[/b] CPU rendered. Though these usually aren't used for highly interactive or commonly visible items, like HUDs (probably because they [i]are[/i] slow).
[/quote]
Aren't Flash and HTML largely overkill for such simple tasks as rendering a GUI ? Apart from having an internet navigator ingame, I fail to see the point.
There is plenty of handy GUI libraries out there that I'm sure are more optimized, and even using my naive implementation of the software renderer it would probably be faster.

[quote name='Hodgman']GPUs [b]are very good[/b] at 2D copying though; even if you do perform it via triangles, they're highly optimized for the task and have specialized hardware to help. They're so good in fact, that you might not even need to bother with any cache targets...[/quote]
With texture atlases maybe. In my case it most certainly helps : my interface renders in 10ms without, and a little less than 3ms with cache targets. There is a little overhead when there is lots of constant changes in the GUI, but in my case that's unlikely.
By the way, I've put and example of what I consider a complex GUI below. It's a game editor here, but you can get the same level of complexity in games like WoW for example.
[center][url="http://darkdragon1.free.fr/interface.jpg"][img]http://darkdragon1.free.fr/interface_th.jpg[/img][/url][/center]

[quote name='Hodgman']
As for old hardware, you'd have to be aiming at ~pre 2006 hardware (which is pretty ancient by today's standards) in order to come across any kind of "poor support".
[/quote]
... or integrated Intel chips :( I have one on my laptop (I bought it this year), and SFML doesn't seem to like it.

[quote name='Krypt0n']another nice thing bout software rendering is, that you can go oldschool and just update the regions that really change (it was common back then), as you anyway render to an offscreen buffer. doing so, you can have very complex gui rendering magnitude faster on cpu than using a gpu.[/quote]
That's more or less the kind of optimization I have in mind when I think of optimizing 2D drawing.
Plus, correct me if I'm wrong, but aren't all our OSes' UIs drawn in software ?

Share this post


Link to post
Share on other sites
sorry to interrupt, but

[code]you cannot use texture tiling for example[/code]

this is a myth which is burned into many peoples heads, but its not true. any sort of tiling can wonderfully be done with any arbitrary texture atlas and without creating extra vertices or something.

please proceed with this good topic now :)

Share this post


Link to post
Share on other sites
[quote name='Kalith' timestamp='1307469339' post='4820599']You don't need "shaders" in software rendering, because you already have full control of the pipeline.[/quote]A "shader" is just code... Software rendering uses code...
Shader's aren't some special feature -- it's just how you write software that runs on a GPU.
Even if you're not using shaders ([i]i.e. you're using the fixed function pipeline[/i]), then that just means you're using the built-in shaders behind the scenes :(

[quote]As for the alpha blending, maybe I'm mistaken, but with all graphics engine I've used (Ogre3D and HGE for example), you couldn't get "true" (as in Photoshop) alpha blending to work out of the box [b]with render targets[/b] ... It could also be done in one pass with shaders I think, but I wonder how good this would perform...[/quote]Yes, you're mistaken. As for the performance of shaders, your own shaders will probably perform [i]better[/i] than the built-in general purpose (fixed function) ones.
[quote]Yes and that's my point. It could probably be much faster if things were optimized for 2D rendering.[/quote]Aside from overhead of setting up a triangle, the GPU launches hundreds of threads, each of which can take an unfiltered texture sample [i]per cycle[/i]. How many texture samples can you read on the CPU [i]per cycle[/i]?
In other words, its much more optimized for 2D rendering than a CPU is, even if it seems geared towards 3D usage.
[quote]Aren't Flash and HTML largely overkill for such simple tasks as rendering a GUI ? Apart from having an internet navigator ingame, I fail to see the point.
There is plenty of handy GUI libraries out there that I'm sure are more optimized, and even using my naive implementation of the software renderer it would probably be faster.[/quote]I wasn't making a point, I was just pointing out that those technologies currently are used in games, and often are rendered on the CPU, which answers your questions "[i]Has anybody considered this alternative?[/i]" and "[i]Does it seem viable to you?[/i]" with "[i]yes[/i]", because it's been done.

As for [i]why [/i]flash and HTML are used in games - it's because there's tools for authoring that kind of content. Building an engine/renderer is only half the battle. There's hundreds of engines out there with sub-standard tools to go with them. If you chose Flash as your GUI solution, then you can use industry-standard tools from Adobe to author your game, and hire graphic designers that don't need as much training.[quote]With texture atlases maybe. In my case it most certainly helps : my interface renders in 10ms without, and a little less than 3ms with cache targets. There is a little overhead when there is lots of constant changes in the GUI, but in my case that's unlikely.[/quote]Is that 10ms of CPU time or GPU time? Either way, it just sounds like you're using the GPU inefficiently (sorry). If you understand CPUs better, then maybe that is a reason to make your optimised GUI renderer on that side though.[quote]... or integrated Intel chips :( I have one on my laptop (I bought it this year), and SFML doesn't seem to like it.[/quote]Every integrated chip since 2006 is DX9.0c compliant (apparently), so render-to-texture should be possible. Likely that SFML needs some massaging if it's not letting you.[quote]Plus, correct me if I'm wrong, but aren't all our OSes' UIs drawn in software?[/quote]As of Vista/Win7, no - they (can) use the GPU.

Share this post


Link to post
Share on other sites
[quote name='MJP' timestamp='1307513461' post='4820813']
Yeah Flash for you UI is more popular than you think...Scaleform has a [i]lot[/i] of licensees. :P
[/quote]and it makes developer's eyes bleed when you watch the cpu+gpu occupation of it, compared to all the other stuff that is going on.








Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1307493340' post='4820747']
Shader's aren't some special feature -- it's just how you write software that runs on a GPU.
Even if you're not using shaders ([i]i.e. you're using the fixed function pipeline[/i]), then that just means you're using the built-in shaders behind the scenes :([/quote]
I have very little knowledge of GPUs, but I do know what a shader is :) Maybe I understood Ashaman wrong, but the way I read it it seems he wanted to point out the fact that shaders had no equivalent in software rendering, which as you said is wrong. Again, maybe I misunderstood his point (I'm not a native english speaker, so please forgive me if that's so).

[quote]Yes, you're mistaken. As for the performance of shaders, your own shaders will probably perform [i]better[/i] than the built-in general purpose (fixed function) ones.[/quote]
With the fixed function pipeline I'm quite sure you can't do it.

[quote]Aside from overhead of setting up a triangle, the GPU launches hundreds of threads, each of which can take an unfiltered texture sample [i]per cycle[/i]. How many texture samples can you read on the CPU [i]per cycle[/i]?
In other words, its much more optimized for 2D rendering than a CPU is, even if it seems geared towards 3D usage.[/quote]
That's a fact I wasn't aware of. But couldn't we all benefit from 2D-optimized GPU instructions ? I know that GPU manufacturers are fighting a "big numbers" war though, and that 2D rendering performances are much less "sexier" than 3D ones, from a customer point a view...

[quote]I wasn't making a point, I was just pointing out that those technologies currently are used in games, and often are rendered on the CPU, which answers your questions "[i]Has anybody considered this alternative?[/i]" and "[i]Does it seem viable to you?[/i]" with "[i]yes[/i]", because it's been done.[/quote]
I was just being curious, and you answered my questions !

[quote]Is that 10ms of CPU time or GPU time? Either way, it just sounds like you're using the GPU inefficiently (sorry). If you understand CPUs better, then maybe that is a reason to make your optimised GUI renderer on that side though.[/quote]
These timings were obtained with hardware rendering. They represent the average elapsed time between two frames being rendered. It's inefficient because there is very little optimization being done, except for the cache target situation. I don't use texture atlases, I don't keep track of batch counts and I only have a lazy batch count reducing policy (if two sprites with the same texture are rendered one after the other, the renderer can create a single batch, but if another one is rendered between the two and with another texture, the batch count goes up to 3). On top of that, I use Ogre here, which is clearly not "2D friendly" (at least in v1.6.4, I don't know how latest releases perform).

[quote]Every integrated chip since 2006 is DX9.0c compliant (apparently), so render-to-texture should be possible. Likely that SFML needs some massaging if it's not letting you.[/quote]
All my point lies in the "should" :) It is indeed possible (it works with Ogre), but they apparently are most of the time a pain in the ... for graphics library programmers.

Share this post


Link to post
Share on other sites
[quote name='Kalith' timestamp='1307569976' post='4821102']With the fixed function pipeline I'm quite sure you can't do it.[/quote]Then use a shader ;)
You're using them already ([i]as of DX9c, the fixed-function pipeline is just an emulation layer, and as of DX10, it doesn't exist at all[/i]) so there's no harm in making your own simple "2d compositing" shader ([i]well, except for the work of shoe-horning shaders into a fixed-function engine, I guess :()[/i][quote]That's a fact I wasn't aware of. But couldn't we all benefit from 2D-optimized GPU instructions ? I know that GPU manufacturers are fighting a "big numbers" war though, and that 2D rendering performances are much less "sexier" than 3D ones, from a customer point a view...[/quote]It's not uncommon for games to spend ~1/3rd of their rendering time on post-processing effects, which are usually entirely 2D/image based. Also, these days deferred rendering techniques are becoming more and more popular, which are often largely 2D as well ([i]so 2D performance is pretty important to 3D game devs too[/i]).

As of DX11, we've got "compute shaders", which are decoupled from the traditional 3D pipeline, and just let you use the GPU's computing hardware for any kind of task that you like -- many new games/engines are starting to use these to implement a lot of their (2D) post-processing stuff, so perhaps your wish is coming true ;)
[quote][quote]Is that 10ms of CPU time or GPU time?[/quote]These timings were obtained with hardware rendering. They represent the average elapsed time between two frames being rendered.[/quote]This is telling you a mixture of CPU and GPU usage, including any stalls. If you measure the elapsed time from the start of your draw functions to their end ([i][b]not[/b] including the Present/SwapBuffers call![/i]) you can tell the CPU usage of the draw-submission part. Using a tool like PIX you can measure the GPU's actual render time as well.
Also, the GPU usually lags behind the CPU with a decent amount of latency -- any draw commands are written to a buffer that is executed at a much later time. Operations that modify GPU-owned resources, such as locking vertex buffers, can force a synchronisation between the GPU and CPU ([i]breaking this natural latency[/i]) and create a large stall on one or both of the processors.

Share this post


Link to post
Share on other sites
[quote name='Hodgman' timestamp='1307607916' post='4821240']
You're using them already ([i]as of DX9c, the fixed-function pipeline is just an emulation layer, and as of DX10, it doesn't exist at all[/i]) so there's no harm in making your own simple "2d compositing" shader ([i]well, except for the work of shoe-horning shaders into a fixed-function engine, I guess :()[/i][/quote]
Really ? I knew it had disapeared in DX10, but I thought DX9 was still using it. Well thank you, I'm learning many things here :)

[quote]As of DX11, we've got "compute shaders", which are decoupled from the traditional 3D pipeline, and just let you use the GPU's computing hardware for any kind of task that you like -- many new games/engines are starting to use these to implement a lot of their (2D) post-processing stuff, so perhaps your wish is coming true ;)[/quote]
Indeed, that's great ! Unfortunately I prefer not to rely on an OS dependent renderer, so I'll probably wait for OpenGL to offer the feature as well (and buy myself a compatible graphics card :().

[quote]This is telling you a mixture of CPU and GPU usage, including any stalls. If you measure the elapsed time from the start of your draw functions to their end ([i][b]not[/b] including the Present/SwapBuffers call![/i]) you can tell the CPU usage of the draw-submission part. Using a tool like PIX you can measure the GPU's actual render time as well.
Also, the GPU usually lags behind the CPU with a decent amount of latency -- any draw commands are written to a buffer that is executed at a much later time. Operations that modify GPU-owned resources, such as locking vertex buffers, can force a synchronisation between the GPU and CPU ([i]breaking this natural latency[/i]) and create a large stall on one or both of the processors.[/quote]
True, but what matters to the end user in the end, and by extension to me, is the resulting frame rate, isn't it ?

Anyway, I think you answered most of my initial questions. I'd really like to try implementing a full sotware renderer and plug it in my GUI system to see how (bad) it performs, but I'm starting to realize that rendering a 1:1 texture on the screen is [i]a bit[/i] simpler than rendering an arbitrary colored/textured triangle on the screen. As My GUI was made with hardware rendering in mind, I'd have to adapt it to explicitly avoid these kind of situations. Maybe that's just too much trouble...
Writing your own renderer is very instructive though. Only then do you realize how much work is actually done by the GPU.

Share this post


Link to post
Share on other sites
While DX9 supports the fixed function pipeline it hasn't existed in hardware since the R300/ATI9700 and GF6 days. The GFFX was the last (utterly rubbish) card to have any form of fixed function pipeline in it, and that was part of a hybrid design as s stepping stone.

As for compute shaders and OpenGL; isnt' going to happen. OpenGL interfaces with OpenCL for that so it will never be build into the API; the good news is OpenCL is widely supported so you can just use that.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this