Idea for a fast BinaryDictionary

Started by
29 comments, last by Narf the Mouse 13 years, 6 months ago
I'm operating on four hours sleep, so I can't tell how good an idea this is, but: One way of storing objects (for an arbitary example, players, enemies and bullets in a FPS - Let's say it's semi-realistic, so bullets aren't instant) is a dictionary. The problem with a standard dictionary, such as that used in .Net, though, is that they use arrays - And arrays are very slow when deleting objects. Such as bullets, which are created and destroyed very fast.

So, you have a BinaryDictionary<Key, Value>. The Key is really an Int32 hash code (through .Net's ubiquitus .GetHashCode() function. When entering the top of the dictionary, we do a bitwise compare. If the first bit is 0, it goes left. If it's 011, it goes right. The next object has a hash code that ends in the bits 001. It goes right, then left, because there's something in the right register.
If we want to look up a key, we just follow the bits. If we look up the key 001, we'll find the second object at Right-Left.
Now, add a Key that ends in the bits 101. It goes right, finds it occupied, goes left, occupied, goes right, empty. We now have objects at Right, Right-Left and Right-Left-Right.

Say we want to delete Key 001. We take the objects at Right-Left-Right and Right. The connecting link is Right-Left; we then need to remove it. Right-Left-Left is empty; if it wasn't, it would be the new Right-Left. So the new Right-Left is Right-Left-Right.

Do I have something here? I think I might, because it sounds like simply a standard Binary Tree that uses hash codes to determine left or right. And simple is good.
Advertisement
It sounds interesting, but say if you were to delete a value with key 101, and then access the data using key 101, the data would have been replaced. So any access to this value using this key would return the wrong value.

( if I interpretted what you wrote correctly )

Also, simple IS good. :)

Saving the world, one semi-colon at a time.

Quote:Original post by Pthalicus
It sounds interesting, but say if you were to delete a value with key 101, and then access the data using key 101, the data would have been replaced. So any access to this value using this key would return the wrong value.

( if I interpretted what you wrote correctly )

Also, simple IS good. :)

That is a problem.

Alternately, the Leaf could have an IsNull boolean. Removing would simply mark "IsNull = true". This has the disadvantage that, without dedicated pruning, it'll never shrink.

Which might be solved by parent Leafs keeping a count of the number of child Leafs and the number which have been marked Null. If it's marked Null and the two are equal, it gets removed.

But first, to test the .Add(Key, Value) speed.
A simple way to significantly speed up dictionaries or other node-based data structures is use a pooled allocation system. You could allocate an array of 1000 nodes, and then all the nodes will be nearby so you'll have good cache locality, but since the elements are still nodes, you can easily delete a node by setting a pointer to null and then linking it into a free list. If you have more than 1000 nodes, you allocate another 1000 nodes and keep a list of the memory blocks you've allocated.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:Original post by Extrarius
A simple way to significantly speed up dictionaries or other node-based data structures is use a pooled allocation system. You could allocate an array of 1000 nodes, and then all the nodes will be nearby so you'll have good cache locality, but since the elements are still nodes, you can easily delete a node by setting a pointer to null and then linking it into a free list. If you have more than 1000 nodes, you allocate another 1000 nodes and keep a list of the memory blocks you've allocated.

That's another thing I considered. Might ask about it in a different thread.
Console output: (Sorry, got the order wrong the first and second time)
List<int>.AddLoops: 1000000, Seconds: 0.0142794,           Loops per Second: 70030953.6815272Dictionary<int, int>.AddLoops: 1000000, Seconds: 0.1295246,           Loops per Second: 7720541.11728583BinaryDictionary<int, int>.AddLoops: 1000000, Seconds: 0.6429706,           Loops per Second: 1555281.06572836for-next List<int>Loops: 1000000, Seconds: 0.00275989999999993, Loops per Second: 362331968.549595foreach Dictionary<int, int>Loops: 1000000, Seconds: 0.0223009000000001,  Loops per Second: 44841239.5912271foreach BinaryDictionary<int, int>Loops: 1000000, Seconds: 0.00191280000000005, Loops per Second: 522793810.121275

So five times slower than a "standard" dictionary on .Add and ten times faster on foreach.







Well, it was a neat theory, but it didn't pan out. Whatever the "standard" dictionary uses to store data, it isn't an array. At least, not a single array.

Dictionary<int, int>.RemoveLoops: 1000000, Seconds: 0.0928441999999998, Loops per Second: 10770732.0435741BinaryDictionary<int, int>.RemoveLoops: 1000000, Seconds: 0.4911663,          Loops per Second: 2035970.3017084


To the Profilermobile!...I would, but SlimTune is reporting nothing. A frequent problem.
To the "request help mobile"!

The remove code: As shown by the comments, I've pared it back to the very minimum, getting 2.5-2.8 million removes a second, up from 2.0-2.2. Anyone got any ideas on where the major slowdowns are?
            public bool Remove(int depth, ref TKey theKey, int theKeyHash)            {                if (keyHash == theKeyHash)                {                    this.isNull = true;                    return true;                }                else                {                    if ((theKeyHash & (1 << depth)) == 0)                    {                        if (left == null)                            return false;                        return left.Remove(depth + 1, ref theKey, theKeyHash);                        /* else if (left.Remove(depth + 1, ref theKey, theKeyHash))                        {                            /* if (left.childCount == 0)                                left = null;                            --childCount;                            return true;                        }                        return false; */                    }                    else                    {                        if (right == null)                            return false;                        return right.Remove(depth + 1, ref theKey, theKeyHash);                        /* else if (right.Remove(depth + 1, ref theKey, theKeyHash))                        {                            /* if (right.childCount == 0)                                right = null;                            --childCount;                            return true;                        }                        return false; */                    }                }            }
You do realize that you just took the ~O(1) insert and removal from standard dictionary and replaced it with one that has much worse asymptotic performance? Even worse, you've destroyed the O(1) look-up, defeating the purpose of using a dictionary (hash-table) in the first place.

If you're worried about insert performance (which according to MSDN can be a O(n) operation if the Count == Capacity), simply tell the dictionary to reserve space in the constructor.
Mike Popoloski | Journal | SlimDX
Quote:Original post by Mike.Popoloski
You do realize that you just took the ~O(1) insert and removal from standard dictionary and replaced it with one that has much worse asymptotic performance? Even worse, you've destroyed the O(1) look-up, defeating the purpose of using a dictionary (hash-table) in the first place.

If you're worried about insert performance (which according to MSDN can be a O(n) operation if the Count == Capacity), simply tell the dictionary to reserve space in the constructor.

It's ten+ times faster on foreach. I've gotten it to 1/3rd as fast on remove; I can probably do the same for add. Given all that, it's useful.

This topic is closed to new replies.

Advertisement