Back to General and Gameplay Programming

[.net] Why is Collections.Generic.List so damn slow?

Scet · 2007-07-25T15:42:13

I was doing a lot of heavy indexer work in my program and noticed that accessing elements in an Array seemed a hell of a lot faster then accessing them in a List. Anyway I wrote this little test program to time how long they take. using System; using Collections = System.Collections.Generic; using Threading = System.Threading; namespace Bob { public static class Application { [STAThread] public static void Main( string[] ArgumentArray ) { int Ticks = 0; int[] Array = new int[320]; Collections.List<int> List = new Collections.List<int>(); for( int x = 0; x < Array.Length; x++ ) { List.Add( x ); } Threading.Thread.Sleep( 1000 ); Ticks = System.Environment.TickCount; for( int i = 0; i < 5000000; i++ ) { for( int j = 0; j < Array.Length; j++ ) { Array[j] = j; } } Console.WriteLine( "Array performance: {0}", System.Environment.TickCount - Ticks ); Threading.Thread.Sleep( 1000 ); Ticks = System.Environment.TickCount; for( int i = 0; i < 5000000; i++ ) { for( int j = 0; j < List.Count; j++ ) { List[j] = j; } } Console.WriteLine( "List performance: {0}", System.Environment.TickCount - Ticks ); Console.ReadLine(); return; } } } The results where 2620 for the Array and 13200 for the List. I was under the impression that the .Net 2.0 List class operated similar to a C++ std::vector and that there would be very little(or no) difference between accessing it and an array. I wrote a simple class to mimic the C++ std::vector to see how it could perform. public sealed class Vector<T> { private T[] Array = null; public int Size { get { return SizeProperty; } private set { SizeProperty = value; } } private int SizeProperty = 0; public T this[int Index] { get { return Array[Index]; } set { Array[Index] = value; } } public Vector() { Array = new T[8]; return; } public Vector( int Reserved ) { Array = new T[Reserved]; return; } public void Add( T Value ) { Array[Size] = Value; Size++; if( Size >= Array.Length ) { System.Array.Resize<T>( ref Array, Array.Length << 1 ); } return; } public T[] ToArray() { if( Size == Array.Length ) { return Array; } T[] NewArray = new T[Size]; System.Array.Copy( Array, NewArray, Size ); return NewArray; } } Using the same style test as the program above, the result was exactly the same as the Arrays. Can any of the .Net gurus explain how the List class works and why its indexer performance is so bad? I did run the tests multiple times to make sure some other process wasn't hogging the CPU and messing it up and I got around the same results every time. I'm on an Athlon64 at 2Ghz, maybe it was something to do with the way the JIT compiled it. Perhaps other people could run the test program and post their results. Edit: Changed loop size from 10000000 to 5000000 to match other posts. [Edited by - Scet on July 22, 2007 4:18:04 PM]

General and Gameplay Programming Programming

Started by Scet July 22, 2007 03:01 PM

28 comments, last by Niksan2 16 years, 9 months ago

remigius

1,173

July 23, 2007 01:38 AM

I can't tell you exactly why the containers generate this code, but maybe you needn't worry about the containers themselves, but your choice of container instead. As you noted yourself, the containers are optimized for typical usage patterns, such as dynamic growth and cheaper insertion/removal of objects. So you should probably use these containers if your algorithm matches this usage pattern. If you have a relatively fixed number of elements, for which fast indexed access is absolutely paramount, arrays probably are the best choice indeed.

I agree it's strange that the List doesn't work like the typical Vector and thus doesn't come closer to array performance, but if you really have as many indexed operarations as your code snippet suggests, you're probably better off using a custom container implementation which gives you more finegrained control anyway.

Quote:Original post by SiCrane
So the List generic version is actually making function calls in the generated code.

The 1st call looks like a by-product of the fancy 'syntactical sugar' that .NET indexers are. The 2nd call probably is a call to the List.Count property to do index bounds checking. This would kinda prove my point that a container tailored to the OPs needs might be a better choice. Obviously a generic lib implementation needs bounds checking on each access to provide a robust container and sticks to .NET practices like indexers, but a specialist implementation could do this more efficiently and/or forego any checks completely for the sake of speed.

Well, hope this is of some consolation at least [smile]

[Edited by - remigius on July 23, 2007 2:38:19 AM]

Rim van Wersch [ MDXInfo ] [ XNAInfo ] [ YouTube ] - Do yourself a favor and bookmark this excellent free online D3D/shader book!

Mike.Popoloski

3,259

July 23, 2007 07:52 AM

Arrays are probably special-cased into the MSIL, whereas any collection written will necessarily be slower, simply because of the need to call a function somewhere in order to get at the data. I am pretty sure that arrays are specially built into MSIL, and therefore don't need this function call, but anything that you write will be just as slow, because you have to make that function call to access the data.

Not that this is a huge speed hit we are talking about here. Besides, if you need that much speed you should probably use arrays, using the "you get what you pay for" approach. Arrays give you speed, while lists allow you to add as many elements as you want without having to recreate the array.

Mike Popoloski | Journal | SlimDX

Dark_Nebula

221

July 24, 2007 12:47 AM

Yes, MSIL has special instructions for single-dimensional, zero-based arrays. For example, the stelem and ldlen instructions seen in the code SiCrane posted.
These allowes the JIT to emit optimized code for array indexing. It will most likely move the bounds-check out of the inner loop.

My Blog

alex_myrpg

351

July 24, 2007 04:32 AM

These are some interesting comparisons, but I'm quite curious why the Vector class performs faster than a simple array in some cases. Surely it would always be slower than just an array, being built on top of one itself? Also, I'm wondering why you're doing Array.Length << 1. This ought to double the size of the array each time you add a value - I would simple do Array.Length + 1. Perhaps I'm misunderstanding here, but they were a few things that struck me as being rather strange when looking at the test results.

About Me | Blog | Website

Cygon

1,219

July 24, 2007 05:05 AM

Just checked the method in reflector, this is what happens inside List<T>.this[int index] in C# form:

if (index >= this._size) {  ThrowHelper.ThrowArgumentOutOfRangeException();}return this._items[index];

Quote:Original post by alex_myrpg
Also, I'm wondering why you're doing Array.Length << 1. This ought to double the size of the array each time you add a value - I would simple do Array.Length + 1.

That's a simple adaptive resizing strategy. When the vector runs out of space, it pre-allocates some more items than it needs to avoid having to allocate and copy its contents all over again when the next item is added.

If a fixed step size was used (eg. always add 64 to the vector's capacity) it might waste a lot of memory when the user has hundreds of vectors with an average size of only 10 items. On the other side, if you used a fixed step size of 4, it would not perform optimally for a single vector with 10,000 items.

So it just doubles the size. This wastes little memory for small vectors and allocations happen in larger chunks the larger the vector gets.

-Markus-

Professional C++ and .NET developer trying to break into indie game development.
Follow my progress: http://blog.nuclex-games.com/ or Twitter - Topics: Ogre3D, Blender, game architecture tips & code snippets.

alex_myrpg

351

July 24, 2007 05:17 AM

Ok, thanks for the explanation. I guess I wasn't quite reading it properly. I suppose the amount by which you increase the size depends how often you're going to add items, so that seems sensible.

About Me | Blog | Website

remigius

1,173

July 24, 2007 05:31 AM

Quote:These are some interesting comparisons, but I'm quite curious why the Vector class performs faster than a simple array in some cases. Surely it would always be slower than just an array, being built on top of one itself?

The differences aren't huge, so I think this only goes to show that the overhead of indexing either the Vector or the arrays is marginal, even compared to the simple operations in the provided test code.

---

Reviewing SiCrane's code snippet, I'd say the performance loss is accounted for. His code calls the get_Count property accessor for each loop iteration (j < list.Count) and the use of indexers by List<> adds another function call (set_Item). That's two virtual function calls in the inner loop you don't have to worry about when using arrays, so I think this mystery can be considered solved.

To confirm the calls are the bottleneck, one could cache the list size in a local variable (since it doesn't seem to change anyway) and check if the performance 'penalty' of using the List is roughly cut in half.

Rim van Wersch [ MDXInfo ] [ XNAInfo ] [ YouTube ] - Do yourself a favor and bookmark this excellent free online D3D/shader book!

Dark_Nebula

221

July 24, 2007 05:40 AM

Quote:That's a simple adaptive resizing strategy

Yes, and it's a good one too. Using this strategy means that n insertions will run in O(n) amortized time, giving O(1) on inserts on average. Just increasing the capacity with one on each insert would give O(n) on inserts.

My Blog

alex_myrpg

351

July 24, 2007 06:13 AM

remigius, that could definitely be a possible explanation, but a test would be nice to see (I'll do one myself if I have some time) - anyway I always thought accessing the Length property of an array shouldn't add much overhead (whereas GetUpperBound(0) might well add some), but I could be wrong. As you said, the differences aren't big, so I doubt there's much to worry about in this case (comparing Array and Vector).

About Me | Blog | Website

remigius

1,173

July 24, 2007 09:13 AM

Quote:Original post by alex_myrpg
... anyway I always thought accessing the Length property of an array shouldn't add much overhead

The MSIL of SiCrane's code shows that accessing array.Length shouldn't add much overhead actually. So it's List.Count that requires a virtual function call (which is relatively expensive), but accessing array.Length should be ok. I was wondering why they didn't make List.Count a field like array.Length is conceptually, but then again they probably figured it shouldn't be that much of an issue for typical container use.

And you're right on writing up some test, I already felt like a slacker when I dumped the suggestion without doing the test myself. Here are some results from my quick and dirty test similar to SiCrane's:

1 - Vanilla array inserts: 1.537s2 - Array inserts with bounds check ala List.set_Item: 2.854s3 - Array inserts with bounds check & version increment ala List.set_Item: 2.86s4 - List inserts with Count in loop: 5.704s5 - List inserts with cached Count: 4.547s

Going from this the costs to SiCrane's code (that's nr 4 in my results) seems to be distrbuted something like this:

- ~20% on List.Count
- ~23% on bounds checking in List.set_Item
- ~27% on the actual array inserts
- < 1% for the version increment

That leaves ~30% for the virutal function call to set_Item itself. This is a bit higher than the List.Count call, but this makes sense since get_Count takes no parameters, whereas set_Item takes two.

That's my dodgy statistical analysis for today [smile]

Rim van Wersch [ MDXInfo ] [ XNAInfo ] [ YouTube ] - Do yourself a favor and bookmark this excellent free online D3D/shader book!

[.net] Why is Collections.Generic.List so damn slow?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

[.net] Why is Collections.Generic.List so damn slow?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines