Sign in to follow this  
shiz98

[.net] Maximizing GDI+ Speed

Recommended Posts

shiz98    139
Hello everyone, I've recently been working on a 2D RTS in C#, using nothing but GDI+ for rendering. This worked fairly well for me, but after upgrading monitors I've noticed that my game's performance has sunk incredibly low. After a quick trip to google I found out that GDI+ doesn't use any hardware acceleration - so as my resolution increased, my computer just couldn't put up with the pixel throughput I needed. So I went to work researching techniques to speed up GDI+. The following is both a collection of tips I've found on the web, and the results of my own ideas. I hope you find them useful. Tip #1 - PixelFormat.Format32bppPArgb When creating Bitmap objects, use PixelFormat.Format32bppPArgb. This provides considerable speed increases - something to do with the internal format used by the Bitmap object. This is extremely simple to implement. Tip #2 - Split large Bitmaps into multiple, smaller bitmaps Instead of having a massive bitmap with a resolution of, say, 1024x1024, break it into smaller chunks of something like 64x64. This will give you a minor increase in speed, due to the way DrawImageUnscaled() scales with size (I have benchmarks I'll post further on). I find that the easiest way to do this is to simply break images up into scan lines, and put them in an array. For the example I gave, I'd have an array of Bitmaps with 1024 entries, each 1024x1 pixels. You can split the images nicely using Bitmap.LockBits() and System.Runtime.InteropServices.Marshal. If there's a demand for them, I'll post examples. It's not too difficult to implement/use if you encapsulate it in its own class. How to draw Bitmaps This is the big one that I have been researching. As far as I know, I'm the only one who's investigated these different techniques and posted an analysis on the internet. If you find any other articles about this on the web, please let me know. Note: The numbers posted here are for copying an image 100 times. Basically, they'll give you an indication of relative speed, but don't use those numbers to do any calculations. If you really want to, you'll have to extrapolate the speed for one iteration. I did this so that numbers wouldn't get ridiculously huge, and so that I wouldn't have output like "Infinity pixels/ms" :P DrawImageUnscaled The traditional method of drawing an image onto a backbuffer is to use DrawImageUnscaled() This works pretty well, and scales excellently. The sweet spot for this technique is images about 384 pixels in width and height, which gives 915 pixels/ms. Unfortunately, it's still ridiculously slow. If your backbuffer and image use the same pixel format, this doesn't make a lot of sense over blitting... Blit the entire image to the backbuffer using System.Marshal The process here is simple. LockBits() on both the backbuffer and the image, create an array of bytes to act as a buffer, copy the image to that byte array in its entirety using Marshal.Copy(), then copy the byte array to the back buffer, once again using Marshal.Copy(). This is somewhere around twice as fast as DrawImageUnscaled, but it scales horribly, probably due to the memory usage skyrocketing as images get bigger. Interestingly, Marshal doesn't allow you to copy from IntPtr to IntPtr (that I've noticed...). Using unsafe code to copy byte-by-byte directly from the image to the backbuffer is far too slow, so I'm content to leave this method as it is. The sweet spot for this method is about 256x256 pixels, which gives 3855 pixels/ms. Wait, it gets better... Blit the image to the backbuffer in small chunks using System.Marshal Allocating and filling chunks of memory several megabytes in size definitely can't be helping speed, so this technique simply uses a smaller buffer size and fills it chunk-by-chunk. In my test case, the buffer size was BitmapData.Stride bytes for each resolution. Basically I copied each scan line from the Bitmap into my array, and then copied that array to the back buffer. The speed is amazing, and the scaling seems to be a little better than blitting the entire image at once (haven't checked this with real maths though). The sweet spot for this technique is about 576x576 pixels, and gives a throughput of 9479 pixels/ms. With proper tweaking, you could probably get a little more out of this method, but I can't think of any ways to increase speed a great deal more than this method does. In order to get more speed, the complexity of copying the image increases. As long as everything has the same format, however, you should be able to write a simple function to do your blitting for you. For the cases where the formats are different, you'll lose a bit of speed, but it is probably still worthwhile to blit. Conclusion I hope someone finds this useful. If you have any ideas on how to speed things up further, please let me know! The final technique is more than fast enough for my needs at the moment. Also, if anyone wants source code for different methods, just let me know. I can also post the code I used for my benchmark program, if there's a demand for it. Benchmark Results These should help you optimize things, and give you a feel for the speed of the different methods. I've yet to do a detailed analysis of them - when/if I do, I'll post it here as well.
Quote:
Using 100 repetitions per size Rendering using DrawImageUnscaled() Size: 128 18 ms = 910.222222222222 pixels/ms Size: 192 37 ms = 996.324324324324 pixels/ms Size: 256 66 ms = 992.969696969697 pixels/ms Size: 320 109 ms = 939.449541284404 pixels/ms Size: 384 154 ms = 957.506493506493 pixels/ms Size: 448 206 ms = 974.291262135922 pixels/ms Size: 512 302 ms = 868.026490066225 pixels/ms Size: 576 366 ms = 906.491803278689 pixels/ms Size: 640 446 ms = 918.385650224215 pixels/ms Size: 704 555 ms = 893.001801801802 pixels/ms Size: 768 651 ms = 906.027649769585 pixels/ms Size: 832 765 ms = 904.867973856209 pixels/ms Size: 896 896 ms = 896 pixels/ms Size: 960 1048 ms = 879.389312977099 pixels/ms Size: 1024 1194 ms = 878.204355108878 pixels/ms Size: 1088 1364 ms = 867.847507331378 pixels/ms Size: 1152 1510 ms = 878.876821192053 pixels/ms Size: 1216 1687 ms = 876.500296384114 pixels/ms Size: 1280 1852 ms = 884.665226781857 pixels/ms Size: 1344 2049 ms = 881.569546120059 pixels/ms Size: 1408 2249 ms = 881.486883059137 pixels/ms Size: 1472 2396 ms = 904.333889816361 pixels/ms Size: 1536 2686 ms = 878.367833209233 pixels/ms Size: 1600 2851 ms = 897.930550683971 pixels/ms Size: 1664 3161 ms = 875.955710218285 pixels/ms Rendering using blitting Size: 128 8 ms = 2048 pixels/ms Size: 192 22 ms = 1675.63636363636 pixels/ms Size: 256 28 ms = 2340.57142857143 pixels/ms Size: 320 31 ms = 3303.22580645161 pixels/ms Size: 384 53 ms = 2782.18867924528 pixels/ms Size: 448 69 ms = 2908.75362318841 pixels/ms Size: 512 126 ms = 2080.50793650794 pixels/ms Size: 576 169 ms = 1963.17159763314 pixels/ms Size: 640 232 ms = 1765.51724137931 pixels/ms Size: 704 298 ms = 1663.14093959732 pixels/ms Size: 768 457 ms = 1290.64332603939 pixels/ms Size: 832 425 ms = 1628.76235294118 pixels/ms Size: 896 509 ms = 1577.2416502947 pixels/ms Size: 960 590 ms = 1562.03389830508 pixels/ms Size: 1024 657 ms = 1596.00608828006 pixels/ms Size: 1088 687 ms = 1723.06259097525 pixels/ms Size: 1152 784 ms = 1692.73469387755 pixels/ms Size: 1216 911 ms = 1623.11306256861 pixels/ms Size: 1280 1026 ms = 1596.88109161793 pixels/ms Size: 1344 1120 ms = 1612.8 pixels/ms Size: 1408 1222 ms = 1622.31096563011 pixels/ms Size: 1472 1332 ms = 1626.71471471471 pixels/ms Size: 1536 1482 ms = 1591.96761133603 pixels/ms Size: 1600 1590 ms = 1610.06289308176 pixels/ms Size: 1664 1730 ms = 1600.51791907514 pixels/ms Rendering using blitting - one line at a time Size: 128 3 ms = 5461.33333333333 pixels/ms Size: 192 6 ms = 6144 pixels/ms Size: 256 9 ms = 7281.77777777778 pixels/ms Size: 320 12 ms = 8533.33333333333 pixels/ms Size: 384 17 ms = 8673.88235294118 pixels/ms Size: 448 22 ms = 9122.90909090909 pixels/ms Size: 512 29 ms = 9039.44827586207 pixels/ms Size: 576 34 ms = 9758.11764705882 pixels/ms Size: 640 44 ms = 9309.09090909091 pixels/ms Size: 704 61 ms = 8124.85245901639 pixels/ms Size: 768 93 ms = 6342.1935483871 pixels/ms Size: 832 122 ms = 5673.96721311475 pixels/ms Size: 896 146 ms = 5498.7397260274 pixels/ms Size: 960 174 ms = 5296.55172413793 pixels/ms Size: 1024 229 ms = 4578.93449781659 pixels/ms Size: 1088 240 ms = 4932.26666666667 pixels/ms Size: 1152 277 ms = 4790.98916967509 pixels/ms Size: 1216 314 ms = 4709.09554140127 pixels/ms Size: 1280 350 ms = 4681.14285714286 pixels/ms Size: 1344 388 ms = 4655.50515463918 pixels/ms Size: 1408 428 ms = 4631.92523364486 pixels/ms Size: 1472 466 ms = 4649.75107296137 pixels/ms Size: 1536 522 ms = 4519.72413793103 pixels/ms Size: 1600 545 ms = 4697.24770642202 pixels/ms Size: 1664 599 ms = 4622.53088480801 pixels/ms
[Edited by - shiz98 on October 9, 2007 3:15:46 PM]

Share this post


Link to post
Share on other sites
shiz98    139
Benchmark Analysis
After plugging the data into Excel, what I've determined is that contrary to my initial thoughts, all methods are linear with respect to pixels. The third method is fastest both in terms of rate and in terms of constant size. The second method comes next for both rate and constant, and DrawImageUnscaled() comes in last. The only advantage I'd say DrawImageUnscaled() has is that it is the most consistent, with the lowest R^2 of the three. Here's the pertinent info:
Quote:

DrawImageUnscaled():
y = 0.0011x -6.7591
R^2 = 0.9996

Blitting:
y = 0.0006x - 14.9366
R^2 = 0.9973

Blitting in chunks:
y = 0.0002x - 23.6602
R^2 = 0.9952

Obviously it'd be best if I got some data for lower sizes, but the benefits don't warrant re-running the benchmark and inputting the data again.

[Edited by - shiz98 on October 9, 2007 3:00:53 PM]

Share this post


Link to post
Share on other sites
Headkaze    607
You can indeed speed things up further by adjusting graphics quality. These can effect the speed of rendering considerably. Especially using NearestNeighbor interpolation when you use DrawImage to stretch an image.

Check out these:

Graphics.InterpolationMode
Graphics.SmoothingMode
Graphics.PixelOffsetMode
Graphics.CompositingQuality
Graphics.TextRenderingHint

You will need to set these before you call DrawImage.

Fast rendering:

g.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.Low; // or NearestNeighbour
g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.None;
g.PixelOffsetMode = System.Drawing.Drawing2D.PixelOffsetMode.None;
g.CompositingQuality = System.Drawing.Drawing2D.CompositingQuality.HighSpeed;
g.TextRenderingHint = System.Drawing.Text.TextRenderingHint.SingleBitPerPixel;


Slow rendering:

g.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.High;
g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality;
g.PixelOffsetMode = System.Drawing.Drawing2D.PixelOffsetMode.HighQuality;
g.CompositingQuality = System.Drawing.Drawing2D.CompositingQuality.HighQuality;
g.TextRenderingHint = System.Drawing.Text.TextRenderingHint.AntiAlias;

Share this post


Link to post
Share on other sites
shiz98    139
Here's the code. It's nothing pretty - just a quick and dirty benchmark.


using System;
using System.Collections.Generic;
using System.Text;
using System.Drawing;
using System.Drawing.Imaging;

namespace RenderingBenchmark
{
class Program
{
static void Main(string[] args)
{
int origSize = 64;
int size = origSize;
int runs = 1;
int reps = 25;
bool exponential = false;
bool outputTxt = false;


System.IO.StreamWriter writer = new System.IO.StreamWriter("");
if (outputTxt)
{
writer = new System.IO.StreamWriter("output.txt");
Console.SetOut(writer);
}
Console.WriteLine("Benchmarking blitting times");
Console.WriteLine("Using " + runs.ToString() + " repetitions per size");
Console.WriteLine("Rendering using DrawImageUnscaled()\n");
DateTime start;
DateTime finish;
TimeSpan elapsed;


for (int i = 0; i < reps; i++)
{
if (exponential)
size = size * 2;
else
size += origSize;
Console.Write("Size: " + size.ToString() + " ");
Bitmap orig = new Bitmap(size, size);
Bitmap from = new Bitmap(size, size);
start = DateTime.Now;
for (int j = 0; j < runs; j++)
{
Graphics.FromImage(orig).DrawImageUnscaled(from, new Point(0, 0));
}
finish = DateTime.Now;
elapsed = finish - start;
double ppms = (size * size) / elapsed.TotalMilliseconds;
Console.WriteLine(elapsed.TotalMilliseconds.ToString() + " ms = " + ppms.ToString() + " pixels/ms");
}

Console.WriteLine("\nRendering using blitting\n");
size = origSize;
for (int i = 0; i < reps; i++)
{
if (exponential)
size = size * 2;
else
size += origSize;
Console.Write("Size: " + size.ToString() + " ");
Bitmap orig = new Bitmap(size, size, PixelFormat.Format32bppPArgb);
Bitmap from = new Bitmap(size, size, PixelFormat.Format32bppPArgb);
start = DateTime.Now;
for (int j = 0; j < runs; j++)
{
Rectangle lockRect = new Rectangle(0, 0, size, size);
BitmapData origData = orig.LockBits(lockRect, ImageLockMode.WriteOnly, PixelFormat.Format32bppPArgb);
BitmapData fromData = from.LockBits(lockRect, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb);

//see if this works - technique one
//origData.Scan0 = fromData.Scan0;

//manually copy data over

byte[] data = new byte[origData.Stride * origData.Height]; //copy the whole thing at a time
System.Runtime.InteropServices.Marshal.Copy(fromData.Scan0, data, 0, data.Length);
System.Runtime.InteropServices.Marshal.Copy(data, 0, origData.Scan0, data.Length);

orig.UnlockBits(origData);
from.UnlockBits(fromData);
}
finish = DateTime.Now;
elapsed = finish - start;
double ppms = (size * size) / elapsed.TotalMilliseconds;
Console.WriteLine(elapsed.TotalMilliseconds.ToString() + " ms = " + ppms.ToString() + " pixels/ms");
}

Console.WriteLine("\nRendering using blitting - one line at a time\n");
size = origSize;
for (int i = 0; i < reps; i++)
{
if (exponential)
size = size * 2;
else
size += origSize;
Console.Write("Size: " + size.ToString() + " ");
Bitmap orig = new Bitmap(size, size, PixelFormat.Format32bppPArgb);
Bitmap from = new Bitmap(size, size, PixelFormat.Format32bppPArgb);
start = DateTime.Now;
for (int j = 0; j < runs; j++)
{
Rectangle lockRect = new Rectangle(0, 0, size, size);
BitmapData origData = orig.LockBits(lockRect, ImageLockMode.WriteOnly, PixelFormat.Format32bppPArgb);
BitmapData fromData = from.LockBits(lockRect, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb);

//see if this works - technique one
//origData.Scan0 = fromData.Scan0;

//manually copy data over
byte[] data = new byte[origData.Stride];
for (int row = 0; row < origData.Height; row++)
{
IntPtr fromPtr = (IntPtr)(fromData.Scan0.ToInt64() + row * origData.Stride);
IntPtr toPtr = (IntPtr)(origData.Scan0.ToInt64() + row * origData.Stride);
System.Runtime.InteropServices.Marshal.Copy(fromPtr, data, 0, data.Length);
System.Runtime.InteropServices.Marshal.Copy(data, 0, toPtr, data.Length);
}

orig.UnlockBits(origData);
from.UnlockBits(fromData);
}
finish = DateTime.Now;
elapsed = finish - start;
double ppms = (size * size) / elapsed.TotalMilliseconds;
Console.WriteLine(elapsed.TotalMilliseconds.ToString() + " ms = " + ppms.ToString() + " pixels/ms");
}

Console.WriteLine("\nPress any key to continue...");
Console.ReadKey(true);
if (outputTxt || writer != null)
writer.Close();
}
}
}


Share this post


Link to post
Share on other sites
omegagames    122
Great post, first off, I was about to give up on GDI+ altogether. Now my question is how do you do screen rendering?

I know there are different ways to do double buffering. There's the .Net-handles-everything way of "Form.DoubleBuffered = true". Then there is the BufferedGraphics class (which I havent looked into extensively). Last, one could just manage their own buffer through a bitmap. I don't have any preference for which method to use, but I can't see any of them helping me escape calling DrawImage. For example, if I drew everything to a bitmap first (which would be fast using the info in your post), I would still then have to copy the end result to the screen via Graphics.DrawImage (which would be slow).

So I guess what I'm asking is what is the "fast" way of getting data to the screen itself? Thanks!

Share this post


Link to post
Share on other sites
shiz98    139
Well, I've been unable to find a way to double buffer my forms by rendering from a bitmap to a bitmap; this doesn't make sense from a GDI perspective anyway. Unfortunately for us, .NET's built-in double buffering is pretty terrible (at least from what I've noticed). The only options left are to use BufferedGraphics, or to do our own double buffering.

I've been digging around the assemblies today using Lutz Roeder's .Net Reflector (fantastic program for this sort of stuff) to see what .Net is doing internally for its drawing methods. What I've discovered is that it does its work using DllImports to the GDI+ libraries. While it would be conceivably possible to write a new Graphics implementation using calls to these libraries (or possibly plain old GDI libraries), it wouldn't work on other platforms (aka Mono on Linux).

BufferedGraphics, BufferedGraphicsContext, and BufferedGraphicsManager, however, are a different story. I haven't used these, as I figured they would have no advantage over my manual double buffering. However, after a bit of snooping around in these classes to figure out what they're doing internally, I found that they are more memory efficient (no bitmap backbuffer like in the manual approach), and that they are probably much faster. While manual double buffering forces you to copy your data using DrawImage(), BufferedGraphics uses our old friend BitBlt straight from GDI. This should bring the speed on par with GDI, while still allowing applications to be truly cross-platform (assuming your target platform has implemented those three classes, that is).

I have yet to test how much faster BufferedGraphics is, but I plan to write up another benchmark within the next few days.

Share this post


Link to post
Share on other sites
omegagames    122
After reading what you said, I looked at BufferedGraphics in the reflector. Like you said, if uses faster methods to render to screen, but it still (ugh) uses a graphics object (and by extension DrawImage) as the only way to write to it. Maybe I've missed something, but that seems like it wouldn't be any faster, though, as you said, maybe more memory efficient.

Share this post


Link to post
Share on other sites
Fiddler    860
A bit offtopic:

I've been following this thread and I am curious on the advantages of GDI+ have over a hardware accelerated approach through OpenGL. Personally I use GDI+ for fonts and OpenGL for everything else (in C#), but I can't imagine any reason to do all rendering through GDI+!

On topic:

Have you tried looking at the source of the Mono project? It might give some more clues on the available fast paths.

Share this post


Link to post
Share on other sites
davepermen    1047
hardware acceleration is simply not everywhere as performant, and not something one can rely on if not targeting gamers. so for casuals (notebook users, ordinary "walk-in-a-shop-and-buy-that-nice-shiny-pc" users) or even business work, using gdi+ is more save, depending on situation.


anyways, played around a bit with the code, unsure about all the numbers, but i've implemented a BitmapHelper.Copy function wich outperforms all the others shown here.. I _guess_ the size of the perfect buffer is cache-dependent (for me, the size of 8 * 1024 bytes hits the optimum.. dualcore 1.2ghz ulv). after leaving caches, the performance of all (except first) are constant and about equal.

as I've rewritten some parts of the test (made it slower, to stabilize more).. here the full code, ready to copypaste:


using System;
using System.Collections.Generic;
using System.Text;
using System.Drawing;
using System.Drawing.Imaging;

namespace RenderingBenchmark
{
class Program
{
static void Main(string[] args)
{
int origSize = 32;
int size = origSize;
int runs = 200;
int reps = 16;
bool exponential = false;

Console.WriteLine("Benchmarking blitting times");
Console.WriteLine("Using " + runs.ToString() + " repetitions per size");
Console.WriteLine("Rendering using DrawImageUnscaled()\n");
DateTime start;
DateTime finish;
TimeSpan elapsed;


for (int i = 0; i < reps; i++)
{
if (exponential)
size = size * 2;
else
size += origSize;
Console.Write("Size: " + (size * size * 4 / 1024).ToString() + "KB ");
using (Bitmap orig = new Bitmap(size, size))
using (Bitmap from = new Bitmap(size, size))
{
start = DateTime.Now;
for (int j = 0; j < runs; j++)
{
using (Graphics g = Graphics.FromImage(orig))
{
g.DrawImageUnscaled(from, new Point(0, 0));
}
}
finish = DateTime.Now;
}
elapsed = finish - start;
double ppms = (size * size) / elapsed.TotalMilliseconds * (double)runs;
Console.WriteLine(elapsed.TotalMilliseconds.ToString() + " ms = " + (ppms / 1000).ToString() + " Mpixels/s");
}

Console.WriteLine("\nRendering using blitting\n");
size = origSize;
for (int i = 0; i < reps; i++)
{
if (exponential)
size = size * 2;
else
size += origSize;
Console.Write("Size: " + (size * size * 4 / 1024).ToString() + "KB ");
using (Bitmap orig = new Bitmap(size, size, PixelFormat.Format32bppPArgb))
using (Bitmap from = new Bitmap(size, size, PixelFormat.Format32bppPArgb))
{
start = DateTime.Now;
for (int j = 0; j < runs; j++)
{
Rectangle lockRect = new Rectangle(0, 0, size, size);
BitmapData origData = orig.LockBits(lockRect, ImageLockMode.WriteOnly, PixelFormat.Format32bppPArgb);
BitmapData fromData = from.LockBits(lockRect, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb);

//see if this works - technique one
//origData.Scan0 = fromData.Scan0;

//manually copy data over

byte[] data = new byte[origData.Stride * origData.Height]; //copy the whole thing at a time
System.Runtime.InteropServices.Marshal.Copy(fromData.Scan0, data, 0, data.Length);
System.Runtime.InteropServices.Marshal.Copy(data, 0, origData.Scan0, data.Length);

orig.UnlockBits(origData);
from.UnlockBits(fromData);
}
finish = DateTime.Now;
}
elapsed = finish - start;
double ppms = (size * size) / elapsed.TotalMilliseconds * (double)runs;
Console.WriteLine(elapsed.TotalMilliseconds.ToString() + " ms = " + (ppms / 1000).ToString() + " Mpixels/s");
}

Console.WriteLine("\nRendering using blitting - one line at a time\n");
size = origSize;
for (int i = 0; i < reps; i++)
{
if (exponential)
size = size * 2;
else
size += origSize;
Console.Write("Size: " + (size * size * 4 / 1024).ToString() + "KB ");
using (Bitmap orig = new Bitmap(size, size, PixelFormat.Format32bppPArgb))
using (Bitmap from = new Bitmap(size, size, PixelFormat.Format32bppPArgb))
{
start = DateTime.Now;
for (int j = 0; j < runs; j++)
{
Rectangle lockRect = new Rectangle(0, 0, size, size);
BitmapData origData = orig.LockBits(lockRect, ImageLockMode.WriteOnly, PixelFormat.Format32bppPArgb);
BitmapData fromData = from.LockBits(lockRect, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb);

//see if this works - technique one
//origData.Scan0 = fromData.Scan0;


//manually copy data over
byte[] data = new byte[origData.Stride];
for (int row = 0; row < origData.Height; row++)
{
IntPtr fromPtr = (IntPtr)(fromData.Scan0.ToInt64() + row * origData.Stride);
IntPtr toPtr = (IntPtr)(origData.Scan0.ToInt64() + row * origData.Stride);
System.Runtime.InteropServices.Marshal.Copy(fromPtr, data, 0, data.Length);
System.Runtime.InteropServices.Marshal.Copy(data, 0, toPtr, data.Length);
}

orig.UnlockBits(origData);
from.UnlockBits(fromData);
}
finish = DateTime.Now;
}
elapsed = finish - start;
double ppms = (size * size) / elapsed.TotalMilliseconds * (double)runs;
Console.WriteLine(elapsed.TotalMilliseconds.ToString() + " ms = " + (ppms / 1000).ToString() + " Mpixels/s");
}
Console.WriteLine("\nRendering using BitmapHelper.Copy\n");
size = origSize;
for (int i = 0; i < reps; i++)
{
if (exponential)
size = size * 2;
else
size += origSize;
Console.Write("Size: " + (size * size * 4 / 1024).ToString() + "KB ");
using (Bitmap orig = new Bitmap(size, size, PixelFormat.Format32bppPArgb))
using (Bitmap from = new Bitmap(size, size, PixelFormat.Format32bppPArgb))
{
start = DateTime.Now;
for (int j = 0; j < runs; j++)
{
BitmapHelper.Copy(from, orig);
}
finish = DateTime.Now;
}
elapsed = finish - start;
double ppms = (size * size) / elapsed.TotalMilliseconds * (double)runs;
Console.WriteLine(elapsed.TotalMilliseconds.ToString() + " ms = " + (ppms / 1000).ToString() + " Mpixels/s");
}

Console.WriteLine("\nPress any key to continue...");
Console.ReadKey(true);

}
}

static class BitmapHelper
{
public static void Copy(Bitmap from, Bitmap to)
{
if(from.Size != to.Size) throw new FormatException("Pictures are not Equal in Size");
if (from.PixelFormat != PixelFormat.Format32bppPArgb) throw new FormatException("Source Picture has wrong PixelFormat");
if (to.PixelFormat != PixelFormat.Format32bppPArgb) throw new FormatException("Target Picture has wrong PixelFormat");

Rectangle lockRect = new Rectangle(0, 0, to.Width, to.Height);
BitmapData toData = to.LockBits(lockRect, ImageLockMode.WriteOnly, PixelFormat.Format32bppPArgb);
BitmapData fromData = from.LockBits(lockRect, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb);

byte[] data = new byte[8 * 1024];

int i = 0;
for (; i < toData.Stride * toData.Height / data.Length; i++)
{
IntPtr fromPtr = (IntPtr)(fromData.Scan0.ToInt32() + i * data.Length);
IntPtr toPtr = (IntPtr)(toData.Scan0.ToInt32() + i * data.Length);
System.Runtime.InteropServices.Marshal.Copy(fromPtr, data, 0, data.Length);
System.Runtime.InteropServices.Marshal.Copy(data, 0, toPtr, data.Length);
}
if ((toData.Stride * toData.Height) % data.Length != 0)
{
int Rest = (toData.Stride * toData.Height) % data.Length;
IntPtr fromPtr = (IntPtr)(fromData.Scan0.ToInt32() + i * data.Length);
IntPtr toPtr = (IntPtr)(toData.Scan0.ToInt32() + i * data.Length);
System.Runtime.InteropServices.Marshal.Copy(fromPtr, data, 0, Rest);
System.Runtime.InteropServices.Marshal.Copy(data, 0, toPtr, Rest);
}

to.UnlockBits(toData);
from.UnlockBits(fromData);
}
}
}

Share this post


Link to post
Share on other sites
davepermen    1047
oh, and my numbers:

First: The Highlight:

Size: 484KB 698 ms = 35.5025787965616 Mpixels/s
Size: 484KB 247 ms = 100.327125506073 Mpixels/s
Size: 484KB 71 ms = 349.025352112676 Mpixels/s
Size: 484KB 51 ms = 485.898039215686 Mpixels/s

Benchmarking blitting times
Using 200 repetitions per size
Rendering using DrawImageUnscaled()

Size: 16KB 70 ms = 11.7028571428571 Mpixels/s
Size: 36KB 76 ms = 24.2526315789474 Mpixels/s
Size: 64KB 103 ms = 31.8135922330097 Mpixels/s
Size: 100KB 157 ms = 32.6114649681529 Mpixels/s
Size: 144KB 209 ms = 35.2765550239234 Mpixels/s
Size: 196KB 294 ms = 34.1333333333333 Mpixels/s
Size: 256KB 400 ms = 32.768 Mpixels/s
Size: 324KB 478 ms = 34.7046025104603 Mpixels/s
Size: 400KB 574 ms = 35.6794425087108 Mpixels/s
Size: 484KB 698 ms = 35.5025787965616 Mpixels/s
Size: 576KB 852 ms = 34.6140845070423 Mpixels/s
Size: 676KB 974 ms = 35.535112936345 Mpixels/s
Size: 784KB 1121 ms = 35.8080285459411 Mpixels/s
Size: 900KB 1279 ms = 36.0281469898358 Mpixels/s
Size: 1024KB 1442 ms = 36.3583911234397 Mpixels/s
Size: 1156KB 1629 ms = 36.3334561080417 Mpixels/s

Rendering using blitting

Size: 16KB 6 ms = 136.533333333333 Mpixels/s
Size: 36KB 7 ms = 263.314285714286 Mpixels/s
Size: 64KB 15 ms = 218.453333333333 Mpixels/s
Size: 100KB 43 ms = 119.06976744186 Mpixels/s
Size: 144KB 64 ms = 115.2 Mpixels/s
Size: 196KB 87 ms = 115.347126436782 Mpixels/s
Size: 256KB 115 ms = 113.975652173913 Mpixels/s
Size: 324KB 149 ms = 111.334228187919 Mpixels/s
Size: 400KB 191 ms = 107.225130890052 Mpixels/s
Size: 484KB 247 ms = 100.327125506073 Mpixels/s
Size: 576KB 314 ms = 93.9210191082803 Mpixels/s
Size: 676KB 411 ms = 84.2121654501217 Mpixels/s
Size: 784KB 496 ms = 80.9290322580645 Mpixels/s
Size: 900KB 581 ms = 79.3115318416523 Mpixels/s
Size: 1024KB 668 ms = 78.4862275449102 Mpixels/s
Size: 1156KB 784 ms = 75.4938775510204 Mpixels/s

Rendering using blitting - one line at a time

Size: 16KB 8 ms = 102.4 Mpixels/s
Size: 36KB 14 ms = 131.657142857143 Mpixels/s
Size: 64KB 15 ms = 218.453333333333 Mpixels/s
Size: 100KB 22 ms = 232.727272727273 Mpixels/s
Size: 144KB 28 ms = 263.314285714286 Mpixels/s
Size: 196KB 36 ms = 278.755555555556 Mpixels/s
Size: 256KB 41 ms = 319.687804878049 Mpixels/s
Size: 324KB 52 ms = 319.015384615385 Mpixels/s
Size: 400KB 59 ms = 347.118644067797 Mpixels/s
Size: 484KB 71 ms = 349.025352112676 Mpixels/s
Size: 576KB 85 ms = 346.955294117647 Mpixels/s
Size: 676KB 101 ms = 342.685148514851 Mpixels/s
Size: 784KB 111 ms = 361.628828828829 Mpixels/s
Size: 900KB 130 ms = 354.461538461539 Mpixels/s
Size: 1024KB 164 ms = 319.687804878049 Mpixels/s
Size: 1156KB 285 ms = 207.674385964912 Mpixels/s

Rendering using BitmapHelper.Copy

Size: 16KB 6 ms = 136.533333333333 Mpixels/s
Size: 36KB 7 ms = 263.314285714286 Mpixels/s
Size: 64KB 9 ms = 364.088888888889 Mpixels/s
Size: 100KB 13 ms = 393.846153846154 Mpixels/s
Size: 144KB 18 ms = 409.6 Mpixels/s
Size: 196KB 22 ms = 456.145454545455 Mpixels/s
Size: 256KB 28 ms = 468.114285714286 Mpixels/s
Size: 324KB 35 ms = 473.965714285714 Mpixels/s
Size: 400KB 43 ms = 476.279069767442 Mpixels/s
Size: 484KB 51 ms = 485.898039215686 Mpixels/s
Size: 576KB 65 ms = 453.710769230769 Mpixels/s
Size: 676KB 77 ms = 449.496103896104 Mpixels/s
Size: 784KB 89 ms = 451.020224719101 Mpixels/s
Size: 900KB 107 ms = 430.654205607477 Mpixels/s
Size: 1024KB 146 ms = 359.101369863014 Mpixels/s
Size: 1156KB 271 ms = 218.40295202952 Mpixels/s

Press any key to continue...

Share this post


Link to post
Share on other sites
davepermen    1047
More should be possible by moving to multithreading.. running two apps in a tight loop showed up 2x 300MB/s compared to one time 480MB/s .. => it could be about 600MB/s for copying.. that's about 17x the speed.

interesting for sure...

Share this post


Link to post
Share on other sites
shiz98    139
omegagames:
There's no call to DrawImage when you use BufferedGraphics. Instead, the class supplies you with a Graphics object to draw to (similar to the one you'd get by creating your own back buffer image). Once you're done your drawing, you call BufferedGraphics.Render(), which then calls BitBlt to copy the data from the Graphics object it supplied to the Graphics object you gave it (usually the one supplied by PaintEventArgs). This is very close to the way you'd go about double buffering in Gdi. The function you want to look at for this is BufferedGraphics.Render - IIRC they all call BufferedGraphics.RenderInternal, which looks like this

private void RenderInternal(HandleRef refTargetDC, BufferedGraphics buffer)
{
IntPtr hdc = buffer.Graphics.GetHdc();
try
{
SafeNativeMethods.BitBlt(refTargetDC, this.targetLoc.X, this.targetLoc.Y, this.virtualSize.Width, this.virtualSize.Height, new HandleRef(buffer.Graphics, hdc), 0, 0, rop);
}
finally
{
buffer.Graphics.ReleaseHdcInternal(hdc);
}
}


SafeNativeMethods.BitBlt is DllImported from Gdi32.dll.

Interestingly enough, the Graphics object you get from PaintEventArgs is created directly using Gdi function calls. You're either supplied a pointer from WndProc (in wParam), or the Control.WmPaint function calls BeginPaint to get one. Check out System.Windows.Forms.Control.WmPaint for the details.

All this goes to show that using BufferedGraphics is basically the same as using Gdi by importing the functions yourself. It just happens to be easier this way.

I should see what Mono does for its BufferedGraphics... hopefully it's implemented.

Share this post


Link to post
Share on other sites
omegagames    122
(Every time I contest this, I feel I've missed something awfully obvious)

From what you've said, I want to compare two options here. Please tell me where I'm wrong, because otherwise I see no speed difference being possible.

Problem (in simplified form): Copy a bitmap object to a control (form background, as the obvious example). Let's call this Bitmap 'bmp'.

1) Bitmap backbuffer:
- Create a Bitmap of the same dimensions as your screen (call it 'buf')
- When rendering:
* Copy bmp to the buf using LockBits and Marshal.Copy (very fast)
* Copy buf to screen by obtaining the screen's graphics object (via PaintEventArgs) with something like "e.Graphics.DrawImage(buf);" (very slow, because of DrawImage)

2)BufferedGraphics backbuffer:
- Obtain the BufferedGraphics object (call it 'bufgfx')
- When rendering:
* Copy bmp to bufgfx by calling bufgfx.DrawImage(bmp) (this is slow because of DrawImage [?])
* Copy bufgfx to screen by calling bufgfx.Render() (very fast)


So the way I see it, its the differece between doing a fast operation followed by a slow one, and a slow operation followed by a fast one. I don't see where the speed increase comes from.

Share this post


Link to post
Share on other sites
shiz98    139
Good example. In the worst case scenario (bmp is the same size as your form), yes, these will probably take about the same amount of time to do. The smaller bmp is, however, the faster BufferedGraphics will be over manual backbuffering. Using BufferedGraphics makes copying the bitmaps slower, but makes swapping to the screen much faster

If you're drawing smaller bitmaps onto the screen, you should see speed increases for a couple of reasons. First of all, DrawImage is more efficient when called on smaller bitmaps. Second, if you were to rewrite a function to emulate DrawImage using LockBits, it wouldn't nearly as fast as in the example, due to the extra calculations that need to be performed.

For another example, my RTS draws terrain in tiles. During the rendering function it calculates what tiles need to be drawn, draws them to by back buffer bitmap, then draws that bitmap to e.Graphics using DrawImage(). To give you a feel for the speed of the different methods, drawing the tiles to the backbuffer using DrawImage() runs at around 22 FPS in fullscreen. Drawing by splitting into scanlines and calling DrawImage() runs at about 32. Using a custom blitting function with LockBits ran at about 37 FPS (unoptimized however, and slightly broken). If using BufferedGraphics gives me a 2x speed increase, it'll do more than all my previous efforts.

Just a thought, it might be able to trick BufferedGraphics into using BitBlt to copy a Bitmap directly to a graphics object by using Graphics.FromImage(bitmapToCopy). I'll look into it.

In any case, I plan to benchmark these thoroughly, so stay tuned.

Share this post


Link to post
Share on other sites
shiz98    139
No benchmark yet, but I did discover that .NET 2.0's built in double buffering is actually better than manual double buffering (using a Bitmap, at least). I tried using BufferedGraphics, but something strange started happening with my program and I didn't have the time to work it out. You can set automatic double buffering with the following:

this.SetStyle(ControlStyles.AllPaintingInWmPaint, true);
this.SetStyle(ControlStyles.UserPaint, true);
this.SetStyle(ControlStyles.OptimizedDoubleBuffer, true);


After setting those in your initialization function, just use the Graphics supplied by PaintEventArgs in OnPaint(), and you're set. I got about 10FPS improvement on my 1400x900 monitor - not too shabby for the amount of work it involves.

Unfortunately, it looks like Microsoft did an excellent job in ensuring that BufferedGraphics couldn't be used to draw from a Bitmap to a Graphics object. I spent about two hours digging through the reflector looking for a way, but I've got nothing. Everything's sealed, certain constructors are internal, and I don't see any other loopholes.

When I finally do get around to posting another benchmark, I hope to be able to post a nice BitBlt-style function for copying from one Bitmap to another.

Share this post


Link to post
Share on other sites
Headkaze    607
Is this what your after?

public static void BitBlt(int x, int y, int width, int height, Bitmap bmpSrc, Graphics gDest)
{
IntPtr hDCDest = gDest.GetHdc();

IntPtr MemDC = Win32.CreateCompatibleDC(hDCDest);
IntPtr MemBmp = Win32.CreateCompatibleBitmap(hDCDest, bmpSrc.Width, bmpSrc.Height);

Win32.SelectObject(MemDC, MemBmp);

Graphics gMem = Graphics.FromHdc(MemDC);

gMem.DrawImage(bmpSrc, 0, 0);

IntPtr hMemDC = gMem.GetHdc();

Win32.BitBlt(hDCDest, x, y, width, height, hMemDC, 0, 0, Win32.TernaryRasterOperations.SRCCOPY);

Win32.DeleteObject(MemBmp);
Win32.DeleteDC(MemDC);
gMem.ReleaseHdc(hMemDC);
gDest.ReleaseHdc(hDCDest);
}

Share this post


Link to post
Share on other sites
iMalc    2466
Quote:
Original post by shiz98
Yeah, but in a cross-platform way :P

Doh, and I was just going to suggest seeing how simply setting the WS_EX_COMPOSITED flag worked for doing double-buffering. Never mind then.

Share this post


Link to post
Share on other sites
Nasenbaer    122
[font="Arial"]Thanks for your great topic. Please, can you test also the methode "[code]Bitmap.Clone[/code]"?
Also I wonder if the flag
[/font][font="Arial"][code]InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.NearestNeighbour[/code]
and
[code]SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.None[/code]

[/font][font="Arial"]was used or if you tested with interpolation.[/font]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this