# Optimizing GUI bitmap rendering (screenshot included)

## Recommended Posts

I have been working on a GUI framework for my game engine for quite a while now (at least 2 months). I was getting extremely frustrated with the standard method of using small triangles for all the pieces of the GUI. I found it very inflexible and hard to add things like hover states. (edit: Hard because it required the use of an image editor, then importing it) About a week ago I started totally from scratch with a new idea, to use a single texture for each window, and each window would be 2 triangles. It works great and its really easy to change the look of things (unlike before). Here is what it looks like: Now, don't let that framerate fool you, it didn't have to do any redrawing in that frame. When something causes a large window to be redrawn constantly (resizing a window and scrolling large areas are the only things that do it right now), it drops to 80-90 FPS. So basically, I am looking for ways to squeeze time out of my bitmap operations, because I know they are the bottleneck. The FPS used to be 40-60 when resizing large windows, and all I did to bring it up was optimize my font rendering in various ways (trim the edges of the characters, clip the text more efficiently, and use the fact that a font is monospace to speed up some text dimension stuff), and change my fill rectangle routine to this:
uint32 rPitch = (x2 - x1) * sizeof(Color);
uint32 dy = m_Pitch - rPitch;
uint32 dx = sizeof(Color);
uint8 * Cursor = m_Buffer + y1 * m_Pitch + x1 * sizeof(Color);
uint8 * End = m_Buffer + (y2 - 1) * m_Pitch + (x2 - 1) * sizeof(Color);
while(Cursor < End)
{
uint8 * ScanlineEnd = Cursor + rPitch;
while(Cursor < ScanlineEnd)
{
*(uint32 *)Cursor = Color;
Cursor += dx;
}
Cursor += dy;
}
That gave me an increase of 12 or so FPS over two nested for loops + array indexing. I searched around google for a bit for resources on fast bitmap operations and didn't find much. Any resources on it would be great, or if you've got a faster fill rectangle you'd like to share that would be cool too :) The FillRectangle function is taking the bulk of the time I'm sure. edit: Trying imageshack again, last one crapped out... [Edited by - Kibble on February 7, 2005 8:51:41 PM]

##### Share on other sites
If it's an option, I would recommend using hardware acceleration (openGL). This will definately give you a huge speed boost. My gui is rendered with openGL and it gets a smooth 60fps constantly(didnt bother turning vsync off to check) even with alpha blending and tons of windows open.

##### Share on other sites
Like I said I tried doing it that way before, it is a pain in the ass that way. You have to edit the images and such in photoshop or whatever, then they have to be imported in my case. I already started going that route and didn't like the inflexibility. This way it is super easy to change the look of a control for whatever reason. If a text box needs a thicker border for example, that would take 1 minute or less, as opposed to much more work mucking around with photoshop.

It is fast enough for my needs, especially considering there will never be large windows, I would just like it to be as fast as possible, and I know some of my raster operations are not anywhere near optimal.

##### Share on other sites
Ok now I understand that you were implying hardware acceleration when you talked about triangles(correct?)

You do know that you don't need to use textures to draw things with hardware right?(In otherwords, no photoshop required.) Colored polygons will work nicely, and they also make gradients extremely easy and fast to render.

You asked about a FillRectangle function. Well, with opengl it doesn't get any easier:

glColor3ub(red, green, blue);
glRecti(x, y, x+width, y+height);

I'm pretty sure that most of your other software 2d functions like BlitImage, DrawBox, etc. can all be replaced with the hardware equivelants very easily. Basically if you design it right, you shouldn't need different code for the software version or the hardware accelerated version. Only the rendering functions need to be different.

Otherwise, if you are totally set on going the software route, you should probably look into assembler. There's no doubt about it, if you want the fastest pixel routines you're going to have to optimize them by hand at the lowest level. Look into mmx/sse/sse2 if you're on amd/intel or altivec if you have a g4 or g5 processor. Combine loop unrolling with those powerful instruction sets and you can get a signifigant speedup, but hardware accelerated graphics will still be a lot faster.

[Edited by - Melekor on February 8, 2005 12:31:53 AM]

##### Share on other sites
Quote:
 Original post by MelekorOk now I understand that you were implying hardware acceleration when you talked about triangles(correct?)
Yes, I use two triangles and a texture for each window. When something within the window changes (sets a dirty flag on itself or another element), it instantiates a rendering class I have made (this is where all my bitmap operations are), sets the clipping rectangle and origin up for the dirty controls, and renders them into the texture.
Quote:
 You do know that you don't need to use textures to draw things with hardware right?(In otherwords, no photoshop required.) Colored polygons will work nicely, and they also make gradients extremely easy and fast to render.You asked about a FillRectangle function. Well, with opengl it doesn't get any easier:glColor3ub(red, green, blue);glRecti(x, y, x+width, y+height);I'm pretty sure that most of your other software 2d functions like BlitImage, DrawBox, etc. can all be replaced with the hardware equivelants very easily. Basically if you design it right, you shouldn't need different code for the software version or the hardware accelerated version. Only the rendering functions need to be different.

I use an abstracted rendering class, that can use either D3D, openGL, or a software renderer I'm working on. The interface does not support an immediate mode like you are giving an example of.

I don't need any more arguments for the polygonal method. I've tried it, there are plenty of things that are much harder to do with polygonal methods. An example is radio buttons, here is my code for rendering them:
Render.CircleFrame(8, 8, 6, Highlight, Shadow, Background);if(IsCheck())   Render.FillCircle(8, 8, 3);

This is something that would require a texture to be created because of the 1 pixel outer circle. Something else would be drawing any line not paralell to an axis with polygons would be very difficult. Keep in mind that the look of that GUI is going to change for the game I'm working on, its not all going to be convenient rectangles and such. I am going to use some images for it, but not nearly as much as I would be if it was all polygonal.

edit:
Quote:
 Otherwise, if you are totally set on going the software route, you should probably look into assembler. There's no doubt about it, if you want the fastest pixel routines you're going to have to optimize them by hand at the lowest level. Look into mmx/sse/sse2 if you're on amd/intel or altivec if you have a g4 or g5 processor. Combine loop unrolling with those powerful instruction sets and you can get a signifigant speedup, but hardware accelerated graphics will still be a lot faster.
I would, but I can't gaurentee the data I'm writing to is aligned properly for any of the SIMD insructions. I could write to a separate aligned buffer first though, that didn't occur to me before. Lastly, it will not be that much faster. For small changes, when only locking the portion of the texture necessary and rendering the changes is extremely fast. I get 300-400 FPS during normal usage, I just want to improve the worst case of a big window being resized edit: or large areas being scrolled, things of that sort.

edit 2: Also scrolling big areas isn't even that bad, its mainly just things that cause the entire window to be drawn because there is a lot of overdraw (each pixel drawn 3-4 times for large controls, scrolling would only cause 1 or 2). I have thought about ways to help with this, such as a list of dirty rectangles instead of only being able to specify an entire element as dirty, but that is a lot more complicated, I want to see if I can speed it up enough by brute force first.

[Edited by - Kibble on February 8, 2005 1:11:39 AM]

##### Share on other sites
Just a thought, maybe you can get the best of both worlds by using the polygonal method, but instead of generating the textures (e.g RadioButton) in photoshop, use your software rendering functions to generate them. That way you get the flexibility of software rendering where you need it and the speed of hardware rendering everywhere else.

##### Share on other sites
Quote:
 Original post by MelekorJust a thought, maybe you can get the best of both worlds by using the polygonal method, but instead of generating the textures (e.g RadioButton) in photoshop, use your software rendering functions to generate them. That way you get the flexibility of software rendering where you need it and the speed of hardware rendering everywhere else.

OK, I don't think you realize this way is as fast or faster (How can it not be when they are essentially the same thing except this uses less geometry?) than the polygon method in all but a few (rare) cases. This is what this thread is about, improving those cases.

I am comparing to Crazy Eddie's GUI, my experience with a few particular games' GUIs, and mainly the way I was developing my GUI before, but I admit I had not started optimizing it when I started over with this idea. There was a lot less room for optimization compared with this way though. I had it very near to the functionality of this, but a little bit buggier because of a few clipping errors (things that were outside of a scrolling area would occaisonally pop from below the parent window, not a Z issue, I had the Z buffer off for all GUI rendering, then and now.)

All I want to do is improve the rare but slow cases of redrawing entire windows, and the best way to do that is to write a faster FillRectangle. I've written it in assembly, but its not that much faster (100 fps edit: more like 95 with spikes at 100 now that I look closely). I am not that good at assembly though so its probably not optimal. My only previous experience with assembly is writing some of my vector and matrix operations with SSE.

##### Share on other sites
Quote:
 Original post by KibbleOK, I don't think you realize this way is as fast or faster (How can it not be when they are essentially the same thing except this uses less geometry?) than the polygon method in all but a few (rare) cases. This is what this thread is about, improving those cases.

If you're filling tons of pixels in memory and then uploading them to the video card, of course it's going to be slower. Drawing polygons is a fast operation for the video card, much faster than transferring texture memory from system memory to video memory. Sorry if I'm not telling you what you want to hear but I've gone through this process myself and that is my finding.

As far as I can see you have 3 alternatives:
1) Accept that it's fast enough.
2) Switch back to the polygonal method(I believe you can still retain the flexibility you want)
3) Use assembler as I've said before. If you decide to go this route, here is an excellent article that describes how to do a super fast memcpy with MMX. The techniques are also applicable to other things, of course.

##### Share on other sites
Quote:
 Original post by MelekorIf you're filling tons of pixels in memory and then uploading them to the video card, of course it's going to be slower. Drawing polygons is a fast operation for the video card, much faster than transferring texture memory from system memory to video memory. Sorry if I'm not telling you what you want to hear but I've gone through this process myself and that is my finding.

No, in general it is faster for almost everything. It rarely ever has to fill tons of pixels, if any pixels at all. More than 99% of frames involve absolutely no texture or vertex buffer manipluation, which is what I've been attempting to say for a few posts now.

Even if resizing windows was 10 FPS, I would just disable resizing windows and continue to use this system because it is faster in virtually every other case. It has to modify a few hundred pixels MAYBE one in 50 frames on average, except in very few situations where it must do large areas.

Here is a table of what I'm trying to say, this way compared with the polygon method:
  Usage            | Texture manipulation vs. polygons-------------------+----------------------------------------- Idle              | Faster, it does absolutely nothing to the texture or verte xbuffer, 2 triangles                    | replaces hundreds or thousands. (vast majority of frames) Moving windows    | Faster, has to update vertices (very few frames) Small controls    | Hard to say, polygons is probably faster                    | but only in the single frame where it has to make the                    | changes, has to upload small areas of pixels (few frames) Large controls    | Slower, has to upload large areas of pixels (very few frames) Resizing windows  | Slower, has to redraw the entire window into the texture (very very few frames)

Keep in mind that even while interacting with the GUI, a huge chunk of the frames will fall under the 'idle' category there. Hundreds of frames probably pass between using any two controls.

All I want to do is speed up those last two rows of that table, the ONLY time that it gets slow is when resizing large windows.

Quote:
 As far as I can see you have 3 alternatives:1) Accept that it's fast enough.

I am going to do this as soon as I determine that there aren't other ways to speed it up more, regardless of whether I get it any faster or not.
Quote:
 3) Use assembler as I've said before. If you decide to go this route, here is an excellent article that describes how to do a super fast memcpy with SSE2. The techniques are also applicable to other things, of course.

Now this is the kind of thing I was looking for by starting this thread. Thanks!

##### Share on other sites
Glad I could finally be of some help!

Someone should do some real benchmarks to see which method is better overall for speed(polygonal wins for ram & texture memory usage) but for now I guess we have to agree to disagree.

BTW, good luck with your project. The screenshot is looking pretty slick. I can just imagine some game/editor alpha blended behind those windows :)

##### Share on other sites
Quote:
 Original post by MelekorSomeone should do some real benchmarks to see which method is better overall for speed(polygonal wins for ram & texture memory usage) but for now I guess we have to agree to disagree.
I think we are just disagreeing with the situations we are comparing. I'm talking average FPS over all frames, when 99% of those its doing nothing more than rendering 2 triangles (no texture manipulation at all). I don't understand how it gets any faster than that.
Quote:
 BTw, good luck with your project. The screenshot is looking pretty slick. I can just imagine some game/editor alpha blended behind those windows :)

Thanks, the editor is actually done already, just done with MFC:

The game will look almost the same, just more art and less green lines ;)

edit: and better lighting, the lighting in the map of that screenshot is crap.

/shameless plug

##### Share on other sites
Wonder my Apple uses hardware acceleration and MS will in LongHorn - Because it Is Faster and looks better too.

With today's graphics hardware (or even 2000's graphics hardware), hardware accelerated graphics would do wonders for the FPS in your GUI. Do some benchmarking and compare before you start making assumptions.

##### Share on other sites
Why are you rendering to a software texture at all? If you are using opengl, render the window to a texture. If it's D3D, use it's equivilant functionality. Reimplementing your primitive drawing functions in either api should be trivial, and you keep all your current benefits. The actual window is still just two triangles most frames, and you will cut out most of your overhead by skipping the image upload step.

##### Share on other sites
AP - It is not an assumption, I already pointed out that it is faster than a popular GUI library that does it this way. I only wanted to improve the rare case of redrawing the entire window, and I can live with stutter while this happens.
Quote:
 Original post by DeyjaWhy are you rendering to a software texture at all? If you are using opengl, render the window to a texture. If it's D3D, use it's equivilant functionality. Reimplementing your primitive drawing functions in either api should be trivial, and you keep all your current benefits. The actual window is still just two triangles most frames, and you will cut out most of your overhead by skipping the image upload step.

Actually when I started doing this I was expecting to do it this way in the end, this software method was going to be temporary because I expected it to be far too slow. Now I'm not so sure.

Don't get me wrong here guys, this is definitely not the best solution for all GUIs. If your GUI elements are going to be bilinear filtered images with lots of animation and such, gradients, textured backgrounds, etc. this will definitely be too slow. For what I want to do though, just a clean, flat, crisp GUI like you see above, its plenty fast and very easy to do.

Here are some exact numbers in case anyone was wondering:
  Usage                                                 | Avg. FPS--------------------------------------------------------+-------------Resizing large (640x480 pixel) windows                  |  95 Large controls: List boxes, tabs, multiline text boxes  |  220Small controls: Buttons, check boxes, scroll bars, etc. |  450Moving window (ie updating vertices only)               |  700Idle                                                    |  950

Keep in mind that for things like tab controls, changing list box selection, clicking butons, etc. that is only the framerate for one frame when the item actually changes looks. Its only when scrolling or otherwise forcing the element to be dirty every frame that the framerate will stay at that rate.

Likewise for resizing the windows, that is only the framerate in a single frame in which the window actually changed size.

In other words, the huge majority of frames get the 'Idle' framerate.

The framerates were obtained by forcing the control or window in question to be dirty, so it would be updated every frame.

edit: I'm not sure what I'm still doing here, I came for a faster FillRectangle routine, not to defend something which I'm absolutely sure is going to fulfill my needs nicely.

edit 2: Defend was the wrong word, I realize you are all just trying to help answer my question by suggesting to use hardware for rasterizing. Honestly, I am 100% sure what I have now will work, it would just be nice if it was a little bit faster in some cases, and a fast FillRectangle will do that.

##### Share on other sites
When working at such large framerates (yes, 950 is freakishly huge) a tiny slow down per frame equals a huge drop in framerate. Your best case renders a frame in 0.0010526315789473684210526315789474 seconds. Your worst case renders a frame in 0.010526315789473684210526315789474 seconds. That's a difference of 0.0094736842105263157894736842105263 seconds. And 95 FPS isn't even bad.

Regardless; without proper testing; your bottleneck is either in the drawing-to-software-texture part or the upload-texture-to-card part. You can replace both parts with GPU operations. Why -wouldn't- you?

##### Share on other sites
Quote:
 Original post by KibbleI don't need any more arguments for the polygonal method. I've tried it, there are plenty of things that are much harder to do with polygonal methods. An example is radio buttons, here is my code for rendering them: Render.CircleFrame(8, 8, 6, Highlight, Shadow, Background);if(IsCheck()) Render.FillCircle(8, 8, 3);This is something that would require a texture to be created because of the 1 pixel outer circle.
Just out of curiousity, why not render these to some textures in all white on load, modulate by vertex colour, and layer several polys on top of each other?

Quote:
 Usage | Avg. FPS--------------------------------------------------------+-------------Resizing large (640x480 pixel) windows | 95 Large controls: List boxes, tabs, multiline text boxes | 220Small controls: Buttons, check boxes, scroll bars, etc. | 450Moving window (ie updating vertices only) | 700Idle | 950
If rendered in hardware, the drop for resize would be negligible.

##### Share on other sites
Hey, it looks great! But anyways, out of curiosity... waht font is that? :-)

thanks
-Dan

##### Share on other sites
Quote:
 Original post by DeyjaRegardless; without proper testing; your bottleneck is either in the drawing-to-software-texture part or the upload-texture-to-card part. You can replace both parts with GPU operations. Why -wouldn't- you?

Because as you said, 95 FPS isn't that bad, I figured it wouldn't take me very long to write a faster FillRectangle (SSE or MMX, or something).
Quote:
 Just out of curiousity, why not render these to some textures in all white on load, modulate by vertex colour, and layer several polys on top of each other?
No reason I can't, I just don't think its worth the effort. Its fast *enough*, I just wanted to see if I could make it a little faster. Plus as stated many many many times by now, redrawing things like radio buttons is extremely fast, faster than GUIs that do it with polygons.
Quote:
 If rendered in hardware, the drop for resize would be negligible.
Yes, theoretically the drop should be zero if the vertex buffers or whatever are rebuilt each frame, however, its a lot more triangles. This screenshot of CEGUI has 1934 triangles, in my system it would be 6. I realize 2000 triangles is nothing for modern cards though.

I think this whole thread is a huge misunderstanding, I'm not looking for vast improvements in performance, its fast enough already. I just want to make it as fast as possible the way it currently is. Right now most of the time is spent in FillRectangle. I also want to learn optimization techniques for things like this, when you don't have the GPU to fall back on.

Here is my inline assembly version:
// clipping stuff omittedif(x2 - x1 > 0 && y2 - y1 > 0){	uint8 * Cursor = m_Buffer + y1 * m_Pitch + x1 * sizeof(Color);	uint8 * End = m_Buffer + y2 * m_Pitch; // address of the last pixel to fill + 4	uint32 dy = m_Pitch - (x2 - x1) * sizeof(Color); // how much to add to cursor when advancing to the next scanline	uint32 rPitch = (x2 - x1) * sizeof(Color); // width of the rectangle in bytes	_asm	{		mov edi, Cursor		mov eax, Color	scanline:		mov ecx, edi		add ecx, rPitch          // find the address of the last pixel on this scanline		pixel:		mov [edi], eax		add edi, 4		cmp edi, ecx            // see if the cursor is at the end of this scanline		jb pixel                // ...			add edi, dy             // advance to the beginning of the next scanline		cmp edi, End            // see if its at the address of the last pixel + 4		jb scanline             // ...	}}

I am very new to assembly, I haven't written anything longer than that, so I know there must be something that can be done to speed that up. That assembly code is around 103 FPS average when a 640x480 pixel window is dirty every frame (the window is being entirely redrawn every frame). Am I wrong in thinking that putting stuff in registers before the loop will help? It didn't seem to affect performance at all. BTW, I am compiling the DLL that this UI code resides in in release mode. The other DLLs and the app are still in debug mode. That assembly code gets almost exactly the same speed as the plain old while loop that I posted in the beginning in release mode. edit: I was mistaken, the while loop I compared with is a bit different now. Is this simply as fast as it gets without SIMD instruction sets?

I attempted to switch that to SSE, but it crashed with movaps, I assumed because of alignment. I'm guessing that was a correct assumption because movups didn't crash, but it was exactly the same speed.

I also tried using Duff's Device, but that actually slowed it down...

Ademan555: The font used for most of the stuff is Verdana. I can't remember what the font used in the console is, I made the image for that one a very long time ago, it might be Courier New.

##### Share on other sites
As pointed out earlier the large drop in frame rate should be of no concern. It is natural that doing something will be much slower than doing nothing. Look at it this way, your eye only starts noticing problems below about 25 fps and cannot tell any change above about 55-60 fps, so don't worry about it. Also, the change you are seeing (from 900 to 90) is drastic but you have to understand that this trend is not linear, you will have to do quite a lot more to get it down much more than that.

One other point, fps is a pretty terrible way of gauging performance in cases like this. You need to be profiling and checking the execution time of specific functions or function groups. If the window scrolling is slowing the fps down to 90 then this won't be a problem as it is likely that the user cannot scroll on more than one window at once.

Hope this helps. Take it easy, and good luck with the program, it's looking real good.

Mark Coleman

##### Share on other sites
This may not be helpful, but why are you so bothered with the FPS of your GUI? It's not a part of the system that should need optimising. When nothing is changing 0fps will suffice. When dragging windows think how slow MSWindows is at doing it smoothly - and that's what everyone will be used to so they won't care if your windows 'judder' at 10fps when moved/resized.
Personally I'd rather work on the game... I can understand the perfectionist approach to optimising everything to the max but that's not godd software engineering IMO.

##### Share on other sites
your idea for your gui sounds really good. When it comes to resizing do you need to redraw the window every frame while it's resizing? How often does someone need to see the data in it while it is resizing?

Why not (as an optional performance enhancement) put the window data into a texture, whack that onto the window vertices and just scale it out?

##### Share on other sites
Quote:
 As pointed out earlier the large drop in frame rate should be of no concern. It is natural that doing something will be much slower than doing nothing. Look at it this way, your eye only starts noticing problems below about 25 fps and cannot tell any change above about 55-60 fps, so don't worry about it. Also, the change you are seeing (from 900 to 90) is drastic but you have to understand that this trend is not linear, you will have to do quite a lot more to get it down much more than that.

Quote:
 This may not be helpful, but why are you so bothered with the FPS of your GUI? It's not a part of the system that should need optimising. When nothing is changing 0fps will suffice. When dragging windows think how slow MSWindows is at doing it smoothly - and that's what everyone will be used to so they won't care if your windows 'judder' at 10fps when moved/resized.Personally I'd rather work on the game... I can understand the perfectionist approach to optimising everything to the max but that's not godd software engineering IMO.

I'm developing an RTS so a lot of the UI will be visible during the game, and I want it to take as little time as possible away from the rest of the frame. This is another reason I'm doing it like this, so when the user is doing stuff with the units and such it takes less time to render the UI.

I'm not worried about the drop in framerate, simply how long its taking to render large dirty areas. Yes, 100 fps isn't visible, but that means its taking 10 milliseconds out of a target 30 millisecond frame time, and on a relatively powerful machine at that. As long as the idle rendering time is less than 5% a 30 ms frame I don't care about it.

I can live with a little stutter while resizing stuff, I just thought it might be a quick thread to get some help with a faster FillRectangle, so why not?
Quote:
 One other point, fps is a pretty terrible way of gauging performance in cases like this. You need to be profiling and checking the execution time of specific functions or function groups. If the window scrolling is slowing the fps down to 90 then this won't be a problem as it is likely that the user cannot scroll on more than one window at once.

I understand this, I have no excuse, just too lazy to get started with a profiler (very recently switched to VC 2003 from VC 6). I am 100% sure the slowest function is FillRectangle however, if I double up filling the window background call (ie do it twice), it nearly doubles the rendering time.
Quote:
 Hope this helps. Take it easy, and good luck with the program, it's looking real good.
Thanks!
Quote:
 Why not (as an optional performance enhancement) put the window data into a texture, whack that onto the window vertices and just scale it out?
Interesting idea. I suppose I could also render the new size to a separate texture and interpolate switching over a few frames so it doesn't snap between resolutions.

I give up on this for now, after the GUI is in use in game and find it to be a problem to redraw large windows I'll come back to it.

Thanks for all the help.

##### Share on other sites
Hi Kibble,

perhaps this link is useful if you are interested in a fast memcpy-routine:

http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_2272_2274,00.html

it's AMD's optimized memcpy for athlons and durons

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
627700
• Total Posts
2978695

• 21
• 14
• 12
• 10
• 12