parallel precomputations

Started by
10 comments, last by lucky6969b 8 years ago
I've got about 1056 nodes to precompute their costs between them. currently on a single threaded precomputer it takes about 1 hour to complete. i am expecting it to complete within 15 minutes instead. so i 'd have to divide the dataset into 4 intervals.
the problem is the nodes are not independen. when set a is computing inside its territory ,after
it finishes it will go to set b with some overlapping results.

how can i make this parallel process most effective ?
thanks
Advertisement

can you split set a ?

would helped if you defined the problem more clear

a network of nodes you have to compute some internode edges for ...

First, you have multiple cores? because multiple threads wont go any faster on a single core....

1056 nodes computing to their near/adjacent nodes ???? (otherwise 1056 toeveryother node is fairly trivail to divide work n ways)

Geographic position of nodes which defines their adjacency ??? Roughly equidistant spacing in 2Dor 3D ??

Yes- split into 4 processing sets to do all adjacents within the set, then stitch the border (a width larger than any adjacent node) bewteen the 4 seperate sections (if laid out below even THAT can be organized into 4 subsets)

1111111111B222222222

1111111111B222222222

1111111111B222222222

1111111111B222222222

1111111111B222222222

1111111111B222222222

BBBBBBBBBMBBBBBBBB

3333333333B444444444

3333333333B444444444

3333333333B444444444

3333333333B444444444

3333333333B444444444

3333333333B444444444

The M (middle) may have to be a final seperate pass to stitch in the small center set of nodes

If you have more than 4 cores then might be split like a finer pie cut

Minimize data locks (map is all read only -- so none) writing within 4 section results are independent so none needed there

the 4 B border stitch edges can be independant (so again none) the center is done on a single core again none.

-

If the nodes are mapped out more like a 3D blob then the borders basically are dividing planes (works the same but identifying the sets is a litle more ciomplicated (exact division processing might take more work than just letting one cores workload take a little longer).

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

We don't know -what- you are precomputing with the nodes.

If at the time of computing a single node does not write to any other node but it's self, then this process will be easy.

You can just use a worker thread set up to compute all the nodes.

http://www.codeproject.com/Articles/4148/Multithreaded-Job-Queue

The way you describe the problem you have a load of data that you need to pre-calculate. None of the calculations effect any of the others and there is no issue with the order they calculate in so what is the issue.

Spin up as many calculators as you have hardware threads and send the results for re-integration when complete. When a calculation completes spin up a new one with the next calculation to perform.

As long as you are not changing the source data at all there are no locking issues to worry about. If you are mutating the source data as you look at it you have a world of woe :)

The said ones are just quad nodes, I've got the astar algorithm working correctly already.
I need to get all of their g scores in order to use in Cooperative pathfinding.
In my first attempt, I got these
The id is just the request id in the thread pool, rather than paths,
I meant there were 4 costs found, don't be misled.
And when the request is done, it is deleted.
You see as it runs down, the number of 0 nodes found increases.
How can I keep these stats low?
Thanks
Jack


Executing: 3 4 Paths found
Deleting Request.. 3
Executing: 1 15 Paths found
Deleting Request.. 1
Executing: 4 124 Paths found
Deleting Request.. 4
Executing: 5 382 Paths found
Deleting Request.. 5
Executing: 2 26 Paths found
Deleting Request.. 2
Executing: 8 0 Paths found
Deleting Request.. 8
Executing: 7 56 Paths found
Deleting Request.. 7
Executing: 6 228 Paths found
Deleting Request.. 6
Executing: 9 8 Paths found
Deleting Request.. 9
Executing: 11 0 Paths found
Deleting Request.. 11
Executing: 13 0 Paths found
Deleting Request.. 13
Executing: 14 0 Paths found
Deleting Request.. 14
Executing: 12 0 Paths found
Deleting Request.. 12
Executing: 10 978 Paths found
Deleting Request.. 10
Executing: 15 0 Paths found
Deleting Request.. 15
Executing: 16 0 Paths found
Deleting Request.. 16
Executing: 18 0 Paths found
Deleting Request.. 18
Executing: 20 0 Paths found
Deleting Request.. 20

1 hour for 1000 nodes, are you executing A* 1000*1000 times ?

Check out the standard dijkstra which will find the shortest path from one node to all nodes. Yes, it will not find always the ways the A* will find, but it could be enough to estimate your rating.

To further optimize it, you can extend your waypoint data to include the dijkstra data necessary (costs and where it comes from) for multiple workers. This way you will be able to run multiple worker threads on the same waypoint graph concurrently (as long as you do not modify the waypoint graph during processing).

Even if you run A* a million times, you're doing something Very Wrong if it takes an hour.

Solve the algorithmic level design first, but also make sure you do some basic optimization profiling on the code, because it seems to me (intuitively) that you have major speed gain opportunities lurking in there.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


I've got about 1056 nodes to precompute their costs between them. currently on a single threaded precomputer it takes about 1 hour to complete. i am expecting it to complete within 15 minutes instead. so i 'd have to divide the dataset into 4 intervals.

So you have a 1056 node mesh, and you are trying to compute the weighted connectivity grid for it? Unless there is something particularly slow about computing the weights, it is a solved problem with many excellent algorithms. With a thousand nodes on a reasonably modern machine the task should take a few seconds at most. I've no idea about the connectivity structure of your nodes, but even in a worst case with 1000 nodes you're looking at a maximum of a few billion comparisons, and on a modern computer that is a few moments of effort.

[attachment=31410:map topoloy.png]
 
The problem is this, when the precomputer is calculating the path between 1 to 1056 by dijkstra's algorithm,
It can flood-fill about 2000 costs at a time, but the next time around, it solves only one because the 1-1055 got all repeated nodes along the way with 1-1056...

The problem actually exists because the dijkstra stops at the destination without flooding the
fringe and that the algorithm will waste on trying to find its neighbour with similar path.

Thanks
Jack

This topic is closed to new replies.

Advertisement