towards a faster A*

Artificial Intelligence Programming

Started by Norman Barrows April 15, 2016 05:31 PM

22 comments, last by Norman Barrows 7 years, 11 months ago

7,180

Author

April 15, 2016 05:31 PM

so i've finally begun the A* implementation for Caveman 3.0. i first played around with A* almost a decade ago. this time around, with my typical outside the box approach to things, i decided to try things a little differently.

to begin with, i'm working with a 300x300 grid type search area. once the code is complete this will most likely be changed to a variable sized search area.

instead of open and closed lists, i keep an A* map: a 300x300 array with f,g,h, and state: open | closed | neither.

for the moment, a separate 300x300 collision map has the obstacle data in it. but the two maps might be combined.

the only slow part is getting lowest f, which requires searching the entire grid.

so the idea to speed up that is to use an index system.

when a node is marked as open (added to the open list), it would be bucket sorted on f into the index system.

with a 300x300 area, the longest straight path is only 420, and a convoluted path might max out at 600 or so. so you would need perhaps 600 buckets. since multiple open nodes can have the same f value (working with ints here), each bucket would be a list.

for the lists i was thinking an un-ordered list with active fields. i figured insert a first inactive and set active to false to remove would be faster than insert at end and copy higher array elements down on remove.

when adding a node to the open list, you'd have to iterate thru that bucket's list to insert at first inactive.

when getting lowest f, you'd have to iterate through the buckets to the first non-empty one, then iterate though its list of nodes to the first active one.

storing the number of nodes in each bucket will speed up determining empty buckets. same idea might apply for first active and inactive node in a bucket, but maybe not. you have to sort or search at some point. same might also be true for first non-empty bucket.

nodes in the a* map and the index system would be doubly linked, IE a map node would have a reference to its index system entry, and its index system entry would have the coordinates of the node in the A* map. this would make operations like removal very fast.

if i can track first non-empty bucket instead of searching for it, then search for first active node in the bucket list is about the only slow part left.

its my understanding that the open list is the "frontier" nodes, and the closed list is the "interior" nodes.

the max number of frontier nodes would be the max size for a bucket list (worst case).

might take a bit of ram. i'm thinking have one A* map and index system, and just keep using them over and over for each entity. entities would store the paths that were the output of the A* system.

thoughts? comments? suggestions?

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

alvaro

21,604

April 15, 2016 05:47 PM

The usual implementation uses a heap for the open list, which allows insertions and retrieval of the lowest f node in time O(log(number_of_open_nodes)). A node can be added to the list multiple times, which is fine as long as you check if the lowest-f open node is closed; in that case simply skip it.

I would start by implementing that. I would try the other stuff only if that's not fast enough.

Good luck!

ApochPiQ

23,138

April 15, 2016 06:50 PM

This is a great exercise in how not to write fast, clean code.

Start with the bare-bones dead-simple A* implementation. Then profile it in actual usage with real data from your game to discover where it is slow and where things could be optimized.

Optimize only those areas, then rinse/repeat until it's suitably fast.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Postie

1,559

April 18, 2016 01:25 AM

I second Alvaro's suggestion re: the Heap. I remember reading about A* and not know how a heap worked it seemed pretty scary, but once I had a crack at implementing it and it started to made sense, my A* implementation became several orders of magnitude faster.

[size="2"]Currently working on an open world survival RPG - For info check out my Development blog:[size="2"] ByteWrangler

Norman Barrows

7,180

Author

April 18, 2016 01:50 PM

my choice of a grid for the open and closed lists was inspired by Amit's A* pages where he combines the two lists onto one.

he goes into a fair amount of detail re: possible implementations, with lots of big O notation. but also points out that different algos are superior, depending on the value of N. specifically, he mentions the o log n algos and their poor performance with N < 200,000.

>> Start with the bare-bones dead-simple A* implementation. Then profile it in actual usage with real data from your game to discover where it is slow and where things could be optimized.

i'm still in the algo EDIT: implementation selection stage here. profiling of prototypes of different implementations would be about the only use for profiling at this point.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

alvaro

21,604

April 18, 2016 03:03 PM

my choice of a grid for the open and closed lists was inspired by Amit's A* pages where he combines the two lists onto one.

he goes into a fair amount of detail re: possible implementations, with lots of big O notation. but also points out that different algos are superior, depending on the value of N. specifically, he mentions the o log n algos and their poor performance with N < 200,000.

I have used a priority queue to compute the distance to goal in the game of Quoridor, which has a 9x9 board. The code is very simple and it was pretty fast: The alpha-beta search went through 1 to 2 million nodes per second --depending on the stage of the game-- while calling A* twice per leaf.

ApochPiQ

23,138

April 18, 2016 08:29 PM

You're not talking about vast data sets here nor are you talking about having extremely complex cost functions. A vanilla A* implementation done correctly is more than fast enough to handle your case.

Again, you're optimizing prematurely and wasting time picking algorithms without any guidance from actual data. You should just implement the simplest thing that could possibly work, and analyze it to understand how your situation behaves.

Everything else is flying blind.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

wodinoneeye

1,691

April 19, 2016 05:31 PM

You are lucky if you can fix your window overlay grid dimensions into constants

Then you can do pointer arithematic on a fixed size map array (ie ptr+1 ptr-1 ptr+span ptr+span+1 ptr-span+1 etc..) .... example - to access the usual 8 neighbors when finding open list candidates You can do this with a single value offset (array of the 8 of offsets and use a for loop) if you make your map 1 dimensional array [90000]

Fixed size HeapQ for the open list (also doing pointer math to walk nodes of the tree -- a systematic power of 2 progression for the binary tree) Max depth shouldnt need to be sized to worst case as you can have them fall off the bottom and IF they ever become canditates again they can be readded to the open list.

use flags ON the window A* grid map (overlay) nodes for your closed list (flags)

If you are still doing link lists then absolutely use node pools instead of costly allocate/deallocate overhead

crush your map/open list data as much as is possible to maximize cache hits

(300 coord size calls for 16 bit ints ...) use bitmaps for flags avoid floats if you can for weight values to use try to use bytes/int16 .

all needed AI* data extracted from the real world map this 300x300 window is overlaid on (yanking from chunks at tthat time

Eliminate if-then X Y edge of map tests being done when getting open list candidates by oversizing map +2 and pre-marking the boundry lines as CLOSED (flagged) - the map edges will now be excluded as part of the normal Closed node test

Finish the current pathfind task - dont timeslice multiple A* pathfindings

if precalc'd edge cost values can be stored in the real world map nodes, then do that (memory is cheap) and make them contiguous so they might be copied/extracted in a block into the A* working data

IF additional map data needs to be extracted/examined BUT the number of nodes this is required for is largely culled down THEN that data doesnt need to be extracted and put into the A* working data (less initial copying AND smaller A* working data....)

figure out how to maximize reuse of the same A* working data for multiple objects

---

Something they rarely include for A* speed calc is the setup time of the working data you will use (interpretting/extracting from the world map and resetting between A* runs)

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

ferrous

6,164

April 19, 2016 07:44 PM

Thanks wodinoneeye, some helpful tips in there for everyone.

One of the other ones I like is using an integer instead of a boolean for closed. Each A* search has an integer, a simple counter, and if the current search != the number on tile, than it's open. It means you never have to go back and do a clear, you just increment the search counter.

Norman Barrows

7,180

Author

April 20, 2016 01:51 PM

>> Something they rarely include for A* speed calc is the setup time of the working data you will use (interpretting/extracting from the world map and resetting between A* runs)

that's where i'm at now.

nested for loops to find the lowest open in the grid clocked in at 11K ms or 6K ms, depending on whether x or z was the outer loop. odd things was, "for x, for z..." was faster than "for z, for x...", when accessing node[x][z]. i thought it was supposed to be the other way around.

needles to say, floats for f,g,and h ran much slower than ints.

implementing the bucket sort on f cut that down to about 300 ms.

but now i'm getting times of around 300 ms whether i limit the search to a range of 10 or 300 (the goal is 271 nodes away).

so i expand 100 nodes (max range 20) and its around 300 ms, or i expand 7000 nodes (max rng 300) and its still around 300 ms.

so its looking like the setup is the major over head.

i made the lists in the bucket sort insert at end and swap last on remove. so no more iterating thru lists to add and remove.

get lowest open still iterates thru the buckets to find the first bucket with num_entries >0, then returns the first entry in that buckets list. i'm currently using 400 buckets. f values are ranging from 0 to a little over 300 at the moment, but theoretically could go higher (400-500). the list size in each bucket is whopping 10,000 entries! so the index is a few meg in size overall.

so add to open consists of setting the astar map node to state=open, calculating f,g,and h, and adding it to the index. adding to index is simply: entry[num_entries] = new data, num_entries++.

add to closed: just set the state to closed, and remove from index. remove from index requires iterating thru the bucket list to find the entry. using an active field, and no swap with last on remove, and insert at first inactive, the index and astar map could be doubly linked (map holds bucket & entry #) as well as bucket entry having astar map coords. this would mean an iteration for remove from index could be avoided - replaced with iteration for insert at first inactive. not sure if that would help though....

get lowest open iterates though the buckets til it finds the first non empty, then returns that bucket's first entry.

so the next step is to put the timers back on it, and see where that 300 ms setup overhead is coming from.

it may turn out that while a all-in-one grid type list for open/closed/neither runs fast, its slow to setup. you have to mark all 90,000 nodes as neither before starting.

i'm already checking for min and max f values to get an idea of the number of buckets i'll need. i plan to add metrics so i an track just how big the entry lists for buckets get. i sure didn't expect 10,000 nodes at the same f value.

i have a feeling that its one of those algos where one of the steps is just plain slow. and you can switch around which of the half dozen steps or so that is, but in the end, one will run slow (iteration) and the others fast (simple assignment).

if, as i suspect, the grid implementation runs pretty fast, but has high setup costs, i will then look for an implementation with lower overhead, such as the traditional two list method. the grid basically keeps three lists, which requires everything to be put in the "neither open nor closed list" as part of setup.

eventually i'll get down to the implementation that has the lowest overall cost, or some number of possible implementations with relatively equal costs.

one thing i don't know is what kind of performance i should expect. right now my best performance is something like 7000+ open nodes processed in ~300+ ms (about 23 open nodes per ms). if you consider each open node processed to actually process the node plus its eight neighbors, i'm processing 63,000+ total nodes in 300+ ms (about 210 total nodes nodes per ms)

another thing i found interesting was that the first time i run the test i'll get high times, like 11K ms. but if i then run it again, i get 300ms. i assume this is due to L2 cache load times for the fist run.

this testing has also pointed out that collision map generation could stand some optimization. i no longer include that in the astar time required. under normal circumstances, the map would already be generated and sitting in the cache, ready to be used.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php