Jump to content

  • Log In with Google      Sign In   
  • Create Account


Binary heap vs list A* - my humble performance comparison [updated with JPS]


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 KnolanCross   Members   -  Reputation: 1168

Like
8Likes
Like

Posted 06 January 2014 - 03:32 PM

*** UPDATED ***

Added JPS information, I added some information about the implementation in another post, be sure to check it.

 

Quite some time ago I released a C implementation of the A* algorithm using grids as the graph and a LinkedList for the OpenList, the topic can be found here:

 

http://www.gamedev.net/topic/632734-a-c-implementation-code/

 

After that I got really busy with a lot of different work and couldn't really touch the code for quite some time.
Recently I managed to put some work on it and implement a version using a binary heap for the open list, as well as a version that uses memory pooling. The pool would increase its size by 4096 * struct size each time it run out of elements.

 

My objective with this post is to show the performance gain you can expect to achieve by optimizing an A* implementation, so that people can decide if they should optimize their own. I won't release the code as of now because it is still a bit ugly and it depends on a container library that I use (it has no package yet, so people would need to compile themselves).

 

Tests Info:

 

The tests were run on a core 2 duo 8400 3.00GHz, the code runs on a single thread.

The OS used was Ubuntu 12.04 32 bits.

All the times are in microseconds (10^6 = 1s).

 

Test on a random generated 64x64 map.

 
List implementation: 235.
Heap implementation: 161.
Pool + Heap implementation: (first path) 124 | (following paths) 105.

JPS + Pool + Heap: (first path) 118 | (following paths) 117.

 

Test on a random generated 640 x 640 map:
 
List implementation: 322823.
Heap implementation: 2006.
Pool + Heap implementation: (first path) 1823 | (following paths) 1652.
JPS + Pool + Heap: (first path) 1724 | (following paths) 1542.
 
Test on a random generated 3600 x 1800 map:
 
List implementation: 8075664
Heap implementation: 131072.
Pool + Heap implementation: (first path) 126942 | (following paths) 124395.
JPS + Pool + Heap: (first path) 7218 | (following paths) 6321.

The results are pretty much what I would expect them to be, as the path size increases the heap implementation will get a lot faster than a list implementation. The pooling advantage isn't really that big as the path size increases, but it does increase your performance.
 

My conclusion would be that optimization would be important only if your paths can get big. I used the list based version on a prototype dungeon game (something like a gauntlet), the monsters would only aggro players up to 20 squares distance, and the performance was not a issue at all.

 

Comments, questions and suggestions are always welcome.


Edited by KnolanCross, 20 January 2014 - 06:58 AM.

My blog on programming and games.
http://16bitsflag.blogspot.com.br/

Sponsor:

#2 ferrous   Members   -  Reputation: 1543

Like
0Likes
Like

Posted 06 January 2014 - 08:38 PM

Thanks for your post, it got me thinking about mine.  I was lazy and just used a built-in structure, the SortedDictionary in C#, but looking at it, it's a red-black tree under the hood so the two main operations I do are inserts, and removal of the top entry and are:

Insert O(log n)   Delete O(log n)

 

While for binary heaps, they are:

Insert O(1)   Delete O(log n)

 

Which, as you can see, will probably give me a decent speed up on maps with large amounts of nodes, though there is no existing BinaryHeap class in C#, so I'd have to write it myself.  (And for now, I'd wager it is not the slowest part of my algorithm by a longshot -- but I'm glad this post made me think about it.)



#3 polyfrag   Crossbones+   -  Reputation: 1734

Like
1Likes
Like

Posted 07 January 2014 - 12:33 AM

What if you combine it with jump point search?

#4 KnolanCross   Members   -  Reputation: 1168

Like
1Likes
Like

Posted 07 January 2014 - 07:32 AM

@ ferrus:

The main problem I found is that  a generic heap implementation doesn't support the value update, so you need a modified one.

In my case I used a function pointer to a set index and get index. Like this:

struct modified_heap_t{
    void** info;
    int (*compareFunction)(void*, void*);
    unsigned (*getIndexFunction)(void*);
    void (*setIndexFunction)(void*, unsigned);
    unsigned used;
    unsigned currentSize;
};

To update de value we just need to swap it from the first and call the heapify:

void modifiedHeapUpdatedValue(modifiedHeap* h, void* data){

    unsigned index = h->getIndexFunction(data);

    if(index == 0 && h->used == 1){
        return;
    }

    void* swap = h->info[0];

    h->info[0] = data;
    h->setIndexFunction(h->info[0], 0);
    h->info[index] = swap;
    h->setIndexFunction(h->info[index], index);

    heapify(h);

}

This link gives you more details:

http://www.policyalmanac.org/games/binaryHeaps.htm

 

@ polyfrag:

 

Had never heard of it, to be honest.

 

Right now I am working on a navmesh like approach. I have coded the graph and the astar on graph with heap, both are working fine. The main problem is the determining the region that each node represents, I used an ad hoc grid flood algorithm, but it is too slow for bigger nodes (I ran it for 20 minutes yesterday for the 3600 x 1800 map and it didn't finish), I believe it is O(n^4). I will try to use this one instead:

http://www.math.unl.edu/~s-dstolee1/Presentations/Sto08-MinRectPart.pdf

 

But I will take a look at jump point search and check if it is easier to implement.


Edited by KnolanCross, 07 January 2014 - 07:34 AM.

My blog on programming and games.
http://16bitsflag.blogspot.com.br/

#5 snowmanZOMG   Members   -  Reputation: 807

Like
0Likes
Like

Posted 08 January 2014 - 06:13 AM

Thanks for your post, it got me thinking about mine.  I was lazy and just used a built-in structure, the SortedDictionary in C#, but looking at it, it's a red-black tree under the hood so the two main operations I do are inserts, and removal of the top entry and are:

Insert O(log n)   Delete O(log n)

 

While for binary heaps, they are:

Insert O(1)   Delete O(log n)

 

Which, as you can see, will probably give me a decent speed up on maps with large amounts of nodes, though there is no existing BinaryHeap class in C#, so I'd have to write it myself.  (And for now, I'd wager it is not the slowest part of my algorithm by a longshot -- but I'm glad this post made me think about it.)

 

Your running times for binary heaps are wrong?  I'm not aware of any binary heap that functions with those running times.  Fibonnaci heaps may give you that insert time (if my memory serves correctly) but I don't think the delete is O(log n) in the worst case.



#6 KnolanCross   Members   -  Reputation: 1168

Like
1Likes
Like

Posted 08 January 2014 - 07:09 AM

I found it strange too, here is wikipedia explaination:

 

"The number of operations required is dependent on the number of levels the new element must rise to satisfy the heap property, thus the insertion operation has a time complexity of O(log n). However, since approximately 50% of the elements are leaves and approximately 75% of the elements are in the bottom two levels, it is likely that the new element to be inserted will only move a few levels upwards to maintain the heap. Thus, binary heaps support insertion in average constant time, O(1)."

 

http://en.wikipedia.org/wiki/Binary_heap#Insert

 

Anyway, as I said, for A*, he will need to implement his own or find one that supports value update.


My blog on programming and games.
http://16bitsflag.blogspot.com.br/

#7 ferrous   Members   -  Reputation: 1543

Like
0Likes
Like

Posted 08 January 2014 - 02:08 PM

 

Thanks for your post, it got me thinking about mine.  I was lazy and just used a built-in structure, the SortedDictionary in C#, but looking at it, it's a red-black tree under the hood so the two main operations I do are inserts, and removal of the top entry and are:

Insert O(log n)   Delete O(log n)

 

While for binary heaps, they are:

Insert O(1)   Delete O(log n)

 

Which, as you can see, will probably give me a decent speed up on maps with large amounts of nodes, though there is no existing BinaryHeap class in C#, so I'd have to write it myself.  (And for now, I'd wager it is not the slowest part of my algorithm by a longshot -- but I'm glad this post made me think about it.)

 

Your running times for binary heaps are wrong?  I'm not aware of any binary heap that functions with those running times.  Fibonnaci heaps may give you that insert time (if my memory serves correctly) but I don't think the delete is O(log n) in the worst case.

 

 

I just snagged them from wikipedia, and I snagged the average times, not the worst case.

 

 

Binary Heap

 

Average / Worst case

Insert O(1) O(log n)

Delete O(log n) O(log n)

 

 

@KnolanCross

Why would I need a version with value update?  In my current implementation I am never updating a node that is already in the open list.


Edited by ferrous, 08 January 2014 - 02:12 PM.


#8 samoth   Crossbones+   -  Reputation: 4510

Like
1Likes
Like

Posted 08 January 2014 - 02:45 PM

Nice. If you make these numbers relative (so the highest is 100%) or if you plot them, the results become even more interesting. First, you see that for small datasets, the difference between list and heap is not all that pronounced (not even a 2x gain, hardly worth mentioning), whereas for large datasets any of the "heap" results are almost not visible any more in comparison to the list figure:

 

untitled.png

 

Now that is an improvement worth showing!

 

What is further interesting is that heap also becomes slightly slower again relative to list as the dataset becomes huge. That very much suggests that most (or at least a significant amount) of the performance improvement doesn't come from big-O but comes from the more cache-friendly access pattern employed by a heap. Since caches are of finite size, the effect eventually limits itself, which explains why it gets "worse" for huge datasets.



#9 KnolanCross   Members   -  Reputation: 1168

Like
1Likes
Like

Posted 08 January 2014 - 02:56 PM

@ ferrous

 

When you are checking the nodes' neighbors they may already be in the open list, if so and the new path is better you need to update the score.

Here is the pseudo code from the wikipedia A* article, the part I am refering to is marked.

function A*(start,goal)
    closedset := the empty set    // The set of nodes already evaluated.
    openset := {start}    // The set of tentative nodes to be evaluated, initially containing the start node
    came_from := the empty map    // The map of navigated nodes.
 
    g_score[start] := 0    // Cost from start along best known path.
    // Estimated total cost from start to goal through y.
    f_score[start] := g_score[start] + heuristic_cost_estimate(start, goal)
 
    while openset is not empty
        current := the node in openset having the lowest f_score[] value
        if current = goal
            return reconstruct_path(came_from, goal)
 
        remove current from openset
        add current to closedset
        for each neighbor in neighbor_nodes(current)
            if neighbor in closedset
                continue
            tentative_g_score := g_score[current] + dist_between(current,neighbor)

>>>>>>>>>>>>>>>>> HERE 
            if neighbor not in openset or tentative_g_score < g_score[neighbor] 
                came_from[neighbor] := current
                g_score[neighbor] := tentative_g_score
                f_score[neighbor] := g_score[neighbor] + heuristic_cost_estimate(neighbor, goal)
                if neighbor not in openset
                    add neighbor to openset
<<<<<<<<<<<<<<<<<
 
    return failure
 
function reconstruct_path(came_from, current_node)
    if current_node in came_from
        p := reconstruct_path(came_from, came_from[current_node])
        return (p + current_node)
    else
        return current_node

Edited by KnolanCross, 08 January 2014 - 02:59 PM.

My blog on programming and games.
http://16bitsflag.blogspot.com.br/

#10 polyfrag   Crossbones+   -  Reputation: 1734

Like
0Likes
Like

Posted 10 January 2014 - 07:54 PM

On a 640x640 map your method was 195 times faster. But on a 3600x1800 it was only 65 times faster. Do you know why?



#11 samoth   Crossbones+   -  Reputation: 4510

Like
1Likes
Like

Posted 11 January 2014 - 07:18 AM

On a 640x640 map your method was 195 times faster. But on a 3600x1800 it was only 65 times faster. Do you know why?

See my above post.

 

Definitively related to memory access pattern (cache effect). This is not only visible from the discrepancy with 3600x1800 but also with the tiny 64x64 map.

 

For the tiny map, everything fits into the cache, so the list approach is still very fast (only 2x difference, almost certainly due to algorithmic difference). The large map benefits both from algorithmic difference and from a more cache-friendly access pattern, allowing most or all of the data to be cached when accessed in the heap implementation. For the huge map, the heap is having too many "holes" in the access pattern to be very cache friendly, so this degenerates to the same "kind of random access" factor as with the list implementation. There is however still an algorithmic advantage, so it still runs a lot faster (only not that much faster).



#12 KnolanCross   Members   -  Reputation: 1168

Like
0Likes
Like

Posted 14 January 2014 - 11:27 AM

Just want to add this info in case anyone got here using search in the future:

 

I said you can't use a custom container, but I was wrong. If you are using C++ you can use a vector as open list by using the make_heap from algorithm. It works, but I have no idea of the performance.


Edited by KnolanCross, 14 January 2014 - 11:30 AM.

My blog on programming and games.
http://16bitsflag.blogspot.com.br/

#13 wodinoneeye   Members   -  Reputation: 666

Like
0Likes
Like

Posted 14 January 2014 - 10:52 PM

I remember helping some guy on USENET something like 10 years ago do improvements on his A* and he came up with use of a customized HeapQ  to optimize significantly. It was similarly used with a fixed grid network.   I recall recommending doing  pointer math directly (the list  'nodes' in a statically allocated pool) in several of the operations and making the data as tight as possible (packing and smaller data types, conbined flags, etc..) to help with the caching issues (and other things like using a closed marked border.to simplify neighbor candidate if-then logic).

 

I havent worked on this stuff for a while but recall that most of the Unix compilers DONT have a data  'pack' implementation  (or one that does anything) which would lose some of the significant optimizations.

 

Caches are bigger now (?) but with maps of the size mentioned  (3600 x 1800)  which is much larger that the ones we we playing with (~1Kx1K) you still will run into alot of misses.

 

I wonder at what use there would be with maps of that size without also considering implementing a hierarchical system.

 

 


--------------------------------------------Ratings are Opinion, not Fact

#14 KnolanCross   Members   -  Reputation: 1168

Like
2Likes
Like

Posted 20 January 2014 - 05:01 AM

I adapted the JPS from jumper ( https://github.com/Yonaba/Jumper/blob/master/jumper/search/jps.lua ) to my C implementation. First I must say that this is not a 100% fair comparisson because jumper's algorithm allows diagonal movement even it there is a wall next to the current node:

 

Here is a partion of the small map to illustrate the difference:

000....00000........x000000......000000000....0000000000......00
0000..000000........x00000........000000000....0000000000.....00
000000000000.........x0000.........00..00000...00000000000....00
000000000000.........x00000..............000...000000000000...00

In my original implementation the movent from the second to the third line is not allowed. A pretty small detail, considerering I am more interested in the performance times.

 

So here they are, all the results are for JPS with memory pooling and heap for open list. Again, times are in microsseconds (10^6 = 1s).

 

Test on a random generated 64x64 map.

 

First path 118. Following paths: 117.

 

 

Test on a random generated 640 x 640 map:
 
First path 1724. Following paths: 1542.
 
Test on a random generated 3600 x 1800 map:
 
First path 7218. Following paths: 6321.
 

So there it is. JPS is, indeed, incredibly fast, still I need to realize how to add the diagonal move restriction.


My blog on programming and games.
http://16bitsflag.blogspot.com.br/

#15 ferrous   Members   -  Reputation: 1543

Like
0Likes
Like

Posted 31 January 2014 - 12:23 AM

Another major limitation of Jump Point Search is its inability to deal with different move costs per tile.  For some games, that's perfectly acceptable (Like Bioware's Infinity Engine games), but for other games, like the Civ series, with mountains, and whatnot, it is not.



#16 KnolanCross   Members   -  Reputation: 1168

Like
0Likes
Like

Posted 31 January 2014 - 07:44 AM

@ferrus

 

The jumper implementation actually comes with a "clearance" that allows you to ignore nodes of determined cost. If you have few variation you can use different clearances to calculate a few paths and peek the cheapest one.


My blog on programming and games.
http://16bitsflag.blogspot.com.br/

#17 polyfrag   Crossbones+   -  Reputation: 1734

Like
0Likes
Like

Posted 14 February 2014 - 11:02 PM

The only thing left now is to write it in assembler (heh), multithread it, or do it on GPU.



#18 Pink Horror   Members   -  Reputation: 1085

Like
0Likes
Like

Posted 15 February 2014 - 12:07 PM


I havent worked on this stuff for a while but recall that most of the Unix compilers DONT have a data  'pack' implementation  (or one that does anything) which would lose some of the significant optimizations.

 

gcc has packed structs.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS