Back to General and Gameplay Programming

Optimizing GUI bitmap rendering (screenshot included)

General and Gameplay Programming Programming

Started by Kibble February 07, 2005 07:36 PM

21 comments, last by h3idi 19 years, 2 months ago

Kibble

504

Author

February 07, 2005 07:36 PM

I have been working on a GUI framework for my game engine for quite a while now (at least 2 months). I was getting extremely frustrated with the standard method of using small triangles for all the pieces of the GUI. I found it very inflexible and hard to add things like hover states. (edit: Hard because it required the use of an image editor, then importing it) About a week ago I started totally from scratch with a new idea, to use a single texture for each window, and each window would be 2 triangles. It works great and its really easy to change the look of things (unlike before). Here is what it looks like:

Now, don't let that framerate fool you, it didn't have to do any redrawing in that frame. When something causes a large window to be redrawn constantly (resizing a window and scrolling large areas are the only things that do it right now), it drops to 80-90 FPS. So basically, I am looking for ways to squeeze time out of my bitmap operations, because I know they are the bottleneck. The FPS used to be 40-60 when resizing large windows, and all I did to bring it up was optimize my font rendering in various ways (trim the edges of the characters, clip the text more efficiently, and use the fact that a font is monospace to speed up some text dimension stuff), and change my fill rectangle routine to this:


uint32 rPitch = (x2 - x1) * sizeof(Color);
uint32 dy = m_Pitch - rPitch;
uint32 dx = sizeof(Color);
uint8 * Cursor = m_Buffer + y1 * m_Pitch + x1 * sizeof(Color);
uint8 * End = m_Buffer + (y2 - 1) * m_Pitch + (x2 - 1) * sizeof(Color);
while(Cursor < End)
{
	uint8 * ScanlineEnd = Cursor + rPitch;
	while(Cursor < ScanlineEnd)
	{
		*(uint32 *)Cursor = Color;
		Cursor += dx;
	}
	Cursor += dy;
}

That gave me an increase of 12 or so FPS over two nested for loops + array indexing. I searched around google for a bit for resources on fast bitmap operations and didn't find much. Any resources on it would be great, or if you've got a faster fill rectangle you'd like to share that would be cool too :) The FillRectangle function is taking the bulk of the time I'm sure. edit: Trying imageshack again, last one crapped out... [Edited by - Kibble on February 7, 2005 8:51:41 PM]

Melekor

379

February 07, 2005 09:42 PM

If it's an option, I would recommend using hardware acceleration (openGL). This will definately give you a huge speed boost. My gui is rendered with openGL and it gets a smooth 60fps constantly(didnt bother turning vsync off to check) even with alpha blending and tons of windows open.

Kibble

504

Author

February 07, 2005 09:56 PM

Like I said I tried doing it that way before, it is a pain in the ass that way. You have to edit the images and such in photoshop or whatever, then they have to be imported in my case. I already started going that route and didn't like the inflexibility. This way it is super easy to change the look of a control for whatever reason. If a text box needs a thicker border for example, that would take 1 minute or less, as opposed to much more work mucking around with photoshop.

It is fast enough for my needs, especially considering there will never be large windows, I would just like it to be as fast as possible, and I know some of my raster operations are not anywhere near optimal.

Melekor

379

February 08, 2005 12:31 AM

Ok now I understand that you were implying hardware acceleration when you talked about triangles(correct?)

You do know that you don't need to use textures to draw things with hardware right?(In otherwords, no photoshop required.) Colored polygons will work nicely, and they also make gradients extremely easy and fast to render.

You asked about a FillRectangle function. Well, with opengl it doesn't get any easier:

glColor3ub(red, green, blue);
glRecti(x, y, x+width, y+height);

I'm pretty sure that most of your other software 2d functions like BlitImage, DrawBox, etc. can all be replaced with the hardware equivelants very easily. Basically if you design it right, you shouldn't need different code for the software version or the hardware accelerated version. Only the rendering functions need to be different.

Otherwise, if you are totally set on going the software route, you should probably look into assembler. There's no doubt about it, if you want the fastest pixel routines you're going to have to optimize them by hand at the lowest level. Look into mmx/sse/sse2 if you're on amd/intel or altivec if you have a g4 or g5 processor. Combine loop unrolling with those powerful instruction sets and you can get a signifigant speedup, but hardware accelerated graphics will still be a lot faster.

[Edited by - Melekor on February 8, 2005 12:31:53 AM]

Kibble

504

Author

February 08, 2005 01:11 AM

Quote:Original post by Melekor
Ok now I understand that you were implying hardware acceleration when you talked about triangles(correct?)

Yes, I use two triangles and a texture for each window. When something within the window changes (sets a dirty flag on itself or another element), it instantiates a rendering class I have made (this is where all my bitmap operations are), sets the clipping rectangle and origin up for the dirty controls, and renders them into the texture.

Quote:You do know that you don't need to use textures to draw things with hardware right?(In otherwords, no photoshop required.) Colored polygons will work nicely, and they also make gradients extremely easy and fast to render.

You asked about a FillRectangle function. Well, with opengl it doesn't get any easier:

glColor3ub(red, green, blue);
glRecti(x, y, x+width, y+height);

I'm pretty sure that most of your other software 2d functions like BlitImage, DrawBox, etc. can all be replaced with the hardware equivelants very easily. Basically if you design it right, you shouldn't need different code for the software version or the hardware accelerated version. Only the rendering functions need to be different.

I use an abstracted rendering class, that can use either D3D, openGL, or a software renderer I'm working on. The interface does not support an immediate mode like you are giving an example of.

I don't need any more arguments for the polygonal method. I've tried it, there are plenty of things that are much harder to do with polygonal methods. An example is radio buttons, here is my code for rendering them:

Render.CircleFrame(8, 8, 6, Highlight, Shadow, Background);if(IsCheck())   Render.FillCircle(8, 8, 3);

This is something that would require a texture to be created because of the 1 pixel outer circle. Something else would be drawing any line not paralell to an axis with polygons would be very difficult. Keep in mind that the look of that GUI is going to change for the game I'm working on, its not all going to be convenient rectangles and such. I am going to use some images for it, but not nearly as much as I would be if it was all polygonal.

edit:

Quote:Otherwise, if you are totally set on going the software route, you should probably look into assembler. There's no doubt about it, if you want the fastest pixel routines you're going to have to optimize them by hand at the lowest level. Look into mmx/sse/sse2 if you're on amd/intel or altivec if you have a g4 or g5 processor. Combine loop unrolling with those powerful instruction sets and you can get a signifigant speedup, but hardware accelerated graphics will still be a lot faster.

I would, but I can't gaurentee the data I'm writing to is aligned properly for any of the SIMD insructions. I could write to a separate aligned buffer first though, that didn't occur to me before. Lastly, it will not be that much faster. For small changes, when only locking the portion of the texture necessary and rendering the changes is extremely fast. I get 300-400 FPS during normal usage, I just want to improve the worst case of a big window being resized edit: or large areas being scrolled, things of that sort.

edit 2: Also scrolling big areas isn't even that bad, its mainly just things that cause the entire window to be drawn because there is a lot of overdraw (each pixel drawn 3-4 times for large controls, scrolling would only cause 1 or 2). I have thought about ways to help with this, such as a list of dirty rectangles instead of only being able to specify an entire element as dirty, but that is a lot more complicated, I want to see if I can speed it up enough by brute force first.

[Edited by - Kibble on February 8, 2005 1:11:39 AM]

Melekor

379

February 08, 2005 06:11 PM

Just a thought, maybe you can get the best of both worlds by using the polygonal method, but instead of generating the textures (e.g RadioButton) in photoshop, use your software rendering functions to generate them. That way you get the flexibility of software rendering where you need it and the speed of hardware rendering everywhere else.

Kibble

504

Author

February 08, 2005 07:21 PM

Quote:Original post by Melekor
Just a thought, maybe you can get the best of both worlds by using the polygonal method, but instead of generating the textures (e.g RadioButton) in photoshop, use your software rendering functions to generate them. That way you get the flexibility of software rendering where you need it and the speed of hardware rendering everywhere else.

OK, I don't think you realize this way is as fast or faster (How can it not be when they are essentially the same thing except this uses less geometry?) than the polygon method in all but a few (rare) cases. This is what this thread is about, improving those cases.

I am comparing to Crazy Eddie's GUI, my experience with a few particular games' GUIs, and mainly the way I was developing my GUI before, but I admit I had not started optimizing it when I started over with this idea. There was a lot less room for optimization compared with this way though. I had it very near to the functionality of this, but a little bit buggier because of a few clipping errors (things that were outside of a scrolling area would occaisonally pop from below the parent window, not a Z issue, I had the Z buffer off for all GUI rendering, then and now.)

All I want to do is improve the rare but slow cases of redrawing entire windows, and the best way to do that is to write a faster FillRectangle. I've written it in assembly, but its not that much faster (100 fps edit: more like 95 with spikes at 100 now that I look closely). I am not that good at assembly though so its probably not optimal. My only previous experience with assembly is writing some of my vector and matrix operations with SSE.

Melekor

379

February 08, 2005 07:49 PM

Quote:Original post by Kibble
OK, I don't think you realize this way is as fast or faster (How can it not be when they are essentially the same thing except this uses less geometry?) than the polygon method in all but a few (rare) cases. This is what this thread is about, improving those cases.

If you're filling tons of pixels in memory and then uploading them to the video card, of course it's going to be slower. Drawing polygons is a fast operation for the video card, much faster than transferring texture memory from system memory to video memory. Sorry if I'm not telling you what you want to hear but I've gone through this process myself and that is my finding.

As far as I can see you have 3 alternatives:
1) Accept that it's fast enough.
2) Switch back to the polygonal method(I believe you can still retain the flexibility you want)
3) Use assembler as I've said before. If you decide to go this route, here is an excellent article that describes how to do a super fast memcpy with MMX. The techniques are also applicable to other things, of course.

Kibble

504

Author

February 08, 2005 08:17 PM

Quote:Original post by Melekor
If you're filling tons of pixels in memory and then uploading them to the video card, of course it's going to be slower. Drawing polygons is a fast operation for the video card, much faster than transferring texture memory from system memory to video memory. Sorry if I'm not telling you what you want to hear but I've gone through this process myself and that is my finding.

No, in general it is faster for almost everything. It rarely ever has to fill tons of pixels, if any pixels at all. More than 99% of frames involve absolutely no texture or vertex buffer manipluation, which is what I've been attempting to say for a few posts now.

Even if resizing windows was 10 FPS, I would just disable resizing windows and continue to use this system because it is faster in virtually every other case. It has to modify a few hundred pixels MAYBE one in 50 frames on average, except in very few situations where it must do large areas.

Here is a table of what I'm trying to say, this way compared with the polygon method:

  Usage            | Texture manipulation vs. polygons-------------------+----------------------------------------- Idle              | Faster, it does absolutely nothing to the texture or verte xbuffer, 2 triangles                    | replaces hundreds or thousands. (vast majority of frames) Moving windows    | Faster, has to update vertices (very few frames) Small controls    | Hard to say, polygons is probably faster                    | but only in the single frame where it has to make the                    | changes, has to upload small areas of pixels (few frames) Large controls    | Slower, has to upload large areas of pixels (very few frames) Resizing windows  | Slower, has to redraw the entire window into the texture (very very few frames)

Keep in mind that even while interacting with the GUI, a huge chunk of the frames will fall under the 'idle' category there. Hundreds of frames probably pass between using any two controls.

All I want to do is speed up those last two rows of that table, the ONLY time that it gets slow is when resizing large windows.

Quote:As far as I can see you have 3 alternatives:
1) Accept that it's fast enough.

I am going to do this as soon as I determine that there aren't other ways to speed it up more, regardless of whether I get it any faster or not.

Quote:3) Use assembler as I've said before. If you decide to go this route, here is an excellent article that describes how to do a super fast memcpy with SSE2. The techniques are also applicable to other things, of course.

Now this is the kind of thing I was looking for by starting this thread. Thanks!

Melekor

379

February 08, 2005 08:45 PM

Glad I could finally be of some help!

Someone should do some real benchmarks to see which method is better overall for speed(polygonal wins for ram & texture memory usage) but for now I guess we have to agree to disagree.

BTW, good luck with your project. The screenshot is looking pretty slick. I can just imagine some game/editor alpha blended behind those windows :)

Optimizing GUI bitmap rendering (screenshot included)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Optimizing GUI bitmap rendering (screenshot included)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines