[.net] Maximizing GDI+ Speed

Started by
19 comments, last by Nasenbaer 12 years, 5 months ago
oh, and my numbers:

First: The Highlight:

Size: 484KB 698 ms = 35.5025787965616 Mpixels/s
Size: 484KB 247 ms = 100.327125506073 Mpixels/s
Size: 484KB 71 ms = 349.025352112676 Mpixels/s
Size: 484KB 51 ms = 485.898039215686 Mpixels/s

Benchmarking blitting timesUsing 200 repetitions per sizeRendering using DrawImageUnscaled()Size: 16KB 70 ms = 11.7028571428571 Mpixels/sSize: 36KB 76 ms = 24.2526315789474 Mpixels/sSize: 64KB 103 ms = 31.8135922330097 Mpixels/sSize: 100KB 157 ms = 32.6114649681529 Mpixels/sSize: 144KB 209 ms = 35.2765550239234 Mpixels/sSize: 196KB 294 ms = 34.1333333333333 Mpixels/sSize: 256KB 400 ms = 32.768 Mpixels/sSize: 324KB 478 ms = 34.7046025104603 Mpixels/sSize: 400KB 574 ms = 35.6794425087108 Mpixels/sSize: 484KB 698 ms = 35.5025787965616 Mpixels/sSize: 576KB 852 ms = 34.6140845070423 Mpixels/sSize: 676KB 974 ms = 35.535112936345 Mpixels/sSize: 784KB 1121 ms = 35.8080285459411 Mpixels/sSize: 900KB 1279 ms = 36.0281469898358 Mpixels/sSize: 1024KB 1442 ms = 36.3583911234397 Mpixels/sSize: 1156KB 1629 ms = 36.3334561080417 Mpixels/sRendering using blittingSize: 16KB 6 ms = 136.533333333333 Mpixels/sSize: 36KB 7 ms = 263.314285714286 Mpixels/sSize: 64KB 15 ms = 218.453333333333 Mpixels/sSize: 100KB 43 ms = 119.06976744186 Mpixels/sSize: 144KB 64 ms = 115.2 Mpixels/sSize: 196KB 87 ms = 115.347126436782 Mpixels/sSize: 256KB 115 ms = 113.975652173913 Mpixels/sSize: 324KB 149 ms = 111.334228187919 Mpixels/sSize: 400KB 191 ms = 107.225130890052 Mpixels/sSize: 484KB 247 ms = 100.327125506073 Mpixels/sSize: 576KB 314 ms = 93.9210191082803 Mpixels/sSize: 676KB 411 ms = 84.2121654501217 Mpixels/sSize: 784KB 496 ms = 80.9290322580645 Mpixels/sSize: 900KB 581 ms = 79.3115318416523 Mpixels/sSize: 1024KB 668 ms = 78.4862275449102 Mpixels/sSize: 1156KB 784 ms = 75.4938775510204 Mpixels/sRendering using blitting - one line at a timeSize: 16KB 8 ms = 102.4 Mpixels/sSize: 36KB 14 ms = 131.657142857143 Mpixels/sSize: 64KB 15 ms = 218.453333333333 Mpixels/sSize: 100KB 22 ms = 232.727272727273 Mpixels/sSize: 144KB 28 ms = 263.314285714286 Mpixels/sSize: 196KB 36 ms = 278.755555555556 Mpixels/sSize: 256KB 41 ms = 319.687804878049 Mpixels/sSize: 324KB 52 ms = 319.015384615385 Mpixels/sSize: 400KB 59 ms = 347.118644067797 Mpixels/sSize: 484KB 71 ms = 349.025352112676 Mpixels/sSize: 576KB 85 ms = 346.955294117647 Mpixels/sSize: 676KB 101 ms = 342.685148514851 Mpixels/sSize: 784KB 111 ms = 361.628828828829 Mpixels/sSize: 900KB 130 ms = 354.461538461539 Mpixels/sSize: 1024KB 164 ms = 319.687804878049 Mpixels/sSize: 1156KB 285 ms = 207.674385964912 Mpixels/sRendering using BitmapHelper.CopySize: 16KB 6 ms = 136.533333333333 Mpixels/sSize: 36KB 7 ms = 263.314285714286 Mpixels/sSize: 64KB 9 ms = 364.088888888889 Mpixels/sSize: 100KB 13 ms = 393.846153846154 Mpixels/sSize: 144KB 18 ms = 409.6 Mpixels/sSize: 196KB 22 ms = 456.145454545455 Mpixels/sSize: 256KB 28 ms = 468.114285714286 Mpixels/sSize: 324KB 35 ms = 473.965714285714 Mpixels/sSize: 400KB 43 ms = 476.279069767442 Mpixels/sSize: 484KB 51 ms = 485.898039215686 Mpixels/sSize: 576KB 65 ms = 453.710769230769 Mpixels/sSize: 676KB 77 ms = 449.496103896104 Mpixels/sSize: 784KB 89 ms = 451.020224719101 Mpixels/sSize: 900KB 107 ms = 430.654205607477 Mpixels/sSize: 1024KB 146 ms = 359.101369863014 Mpixels/sSize: 1156KB 271 ms = 218.40295202952 Mpixels/sPress any key to continue...

If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

Advertisement
More should be possible by moving to multithreading.. running two apps in a tight loop showed up 2x 300MB/s compared to one time 480MB/s .. => it could be about 600MB/s for copying.. that's about 17x the speed.

interesting for sure...
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

omegagames:
There's no call to DrawImage when you use BufferedGraphics. Instead, the class supplies you with a Graphics object to draw to (similar to the one you'd get by creating your own back buffer image). Once you're done your drawing, you call BufferedGraphics.Render(), which then calls BitBlt to copy the data from the Graphics object it supplied to the Graphics object you gave it (usually the one supplied by PaintEventArgs). This is very close to the way you'd go about double buffering in Gdi. The function you want to look at for this is BufferedGraphics.Render - IIRC they all call BufferedGraphics.RenderInternal, which looks like this
private void RenderInternal(HandleRef refTargetDC, BufferedGraphics buffer){    IntPtr hdc = buffer.Graphics.GetHdc();    try    {        SafeNativeMethods.BitBlt(refTargetDC, this.targetLoc.X, this.targetLoc.Y, this.virtualSize.Width, this.virtualSize.Height, new HandleRef(buffer.Graphics, hdc), 0, 0, rop);    }    finally    {        buffer.Graphics.ReleaseHdcInternal(hdc);    }}

SafeNativeMethods.BitBlt is DllImported from Gdi32.dll.

Interestingly enough, the Graphics object you get from PaintEventArgs is created directly using Gdi function calls. You're either supplied a pointer from WndProc (in wParam), or the Control.WmPaint function calls BeginPaint to get one. Check out System.Windows.Forms.Control.WmPaint for the details.

All this goes to show that using BufferedGraphics is basically the same as using Gdi by importing the functions yourself. It just happens to be easier this way.

I should see what Mono does for its BufferedGraphics... hopefully it's implemented.
(Every time I contest this, I feel I've missed something awfully obvious)

From what you've said, I want to compare two options here. Please tell me where I'm wrong, because otherwise I see no speed difference being possible.

Problem (in simplified form): Copy a bitmap object to a control (form background, as the obvious example). Let's call this Bitmap 'bmp'.

1) Bitmap backbuffer:
- Create a Bitmap of the same dimensions as your screen (call it 'buf')
- When rendering:
* Copy bmp to the buf using LockBits and Marshal.Copy (very fast)
* Copy buf to screen by obtaining the screen's graphics object (via PaintEventArgs) with something like "e.Graphics.DrawImage(buf);" (very slow, because of DrawImage)

2)BufferedGraphics backbuffer:
- Obtain the BufferedGraphics object (call it 'bufgfx')
- When rendering:
* Copy bmp to bufgfx by calling bufgfx.DrawImage(bmp) (this is slow because of DrawImage [?])
* Copy bufgfx to screen by calling bufgfx.Render() (very fast)


So the way I see it, its the differece between doing a fast operation followed by a slow one, and a slow operation followed by a fast one. I don't see where the speed increase comes from.
Good example. In the worst case scenario (bmp is the same size as your form), yes, these will probably take about the same amount of time to do. The smaller bmp is, however, the faster BufferedGraphics will be over manual backbuffering. Using BufferedGraphics makes copying the bitmaps slower, but makes swapping to the screen much faster

If you're drawing smaller bitmaps onto the screen, you should see speed increases for a couple of reasons. First of all, DrawImage is more efficient when called on smaller bitmaps. Second, if you were to rewrite a function to emulate DrawImage using LockBits, it wouldn't nearly as fast as in the example, due to the extra calculations that need to be performed.

For another example, my RTS draws terrain in tiles. During the rendering function it calculates what tiles need to be drawn, draws them to by back buffer bitmap, then draws that bitmap to e.Graphics using DrawImage(). To give you a feel for the speed of the different methods, drawing the tiles to the backbuffer using DrawImage() runs at around 22 FPS in fullscreen. Drawing by splitting into scanlines and calling DrawImage() runs at about 32. Using a custom blitting function with LockBits ran at about 37 FPS (unoptimized however, and slightly broken). If using BufferedGraphics gives me a 2x speed increase, it'll do more than all my previous efforts.

Just a thought, it might be able to trick BufferedGraphics into using BitBlt to copy a Bitmap directly to a graphics object by using Graphics.FromImage(bitmapToCopy). I'll look into it.

In any case, I plan to benchmark these thoroughly, so stay tuned.
No benchmark yet, but I did discover that .NET 2.0's built in double buffering is actually better than manual double buffering (using a Bitmap, at least). I tried using BufferedGraphics, but something strange started happening with my program and I didn't have the time to work it out. You can set automatic double buffering with the following:
this.SetStyle(ControlStyles.AllPaintingInWmPaint, true);this.SetStyle(ControlStyles.UserPaint, true);this.SetStyle(ControlStyles.OptimizedDoubleBuffer, true);

After setting those in your initialization function, just use the Graphics supplied by PaintEventArgs in OnPaint(), and you're set. I got about 10FPS improvement on my 1400x900 monitor - not too shabby for the amount of work it involves.

Unfortunately, it looks like Microsoft did an excellent job in ensuring that BufferedGraphics couldn't be used to draw from a Bitmap to a Graphics object. I spent about two hours digging through the reflector looking for a way, but I've got nothing. Everything's sealed, certain constructors are internal, and I don't see any other loopholes.

When I finally do get around to posting another benchmark, I hope to be able to post a nice BitBlt-style function for copying from one Bitmap to another.
Is this what your after?

public static void BitBlt(int x, int y, int width, int height, Bitmap bmpSrc, Graphics gDest){	IntPtr hDCDest = gDest.GetHdc();	IntPtr MemDC = Win32.CreateCompatibleDC(hDCDest);	IntPtr MemBmp = Win32.CreateCompatibleBitmap(hDCDest, bmpSrc.Width, bmpSrc.Height);	Win32.SelectObject(MemDC, MemBmp);	Graphics gMem = Graphics.FromHdc(MemDC);	gMem.DrawImage(bmpSrc, 0, 0);	IntPtr hMemDC = gMem.GetHdc();	Win32.BitBlt(hDCDest, x, y, width, height, hMemDC, 0, 0, Win32.TernaryRasterOperations.SRCCOPY);	Win32.DeleteObject(MemBmp);	Win32.DeleteDC(MemDC);	gMem.ReleaseHdc(hMemDC);	gDest.ReleaseHdc(hDCDest);}
Yeah, but in a cross-platform way :P
Quote:Original post by shiz98
Yeah, but in a cross-platform way :P

Doh, and I was just going to suggest seeing how simply setting the WS_EX_COMPOSITED flag worked for doing double-buffering. Never mind then.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
[font="Arial"]Thanks for your great topic. Please, can you test also the methode "Bitmap.Clone"?
Also I wonder if the flag
[/font][font="Arial"]InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.NearestNeighbour
and
SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.None

[/font][font="Arial"]was used or if you tested with interpolation.[/font]

This topic is closed to new replies.

Advertisement