How can I save/flush the offline data to disk and start it from there next time over?

Started by
5 comments, last by Zlodo 8 years, 7 months ago

I need to generate a large amount of map data from my game editor offline.

But the generation takes more than 6 hours to complete.

If the end of the day is reached, you know, you need to stop working and some sort.

You need to flush the data to disk.

But How can I start it next time from the check point?

What variables do I need to save for next time commencing?

I don't want to calculate everything from the start.

What strategy is good for this kind of data dumping?

Thanks

Jack

Advertisement

"What variables do I need to save for next time commencing?"

This highly depends on your algorithm and code

"What strategy is good for this kind of data dumping?"
I assume that your algorithm has some kind of state. For example, you are converting a heightmap to a normal map. Your input is the heightmap, your output is the normal map, and your state is your position in the heightmap.

For example, you have a 65536x65536 sized heightmap and you are at [17, 8192] and want to continue tomorrow. So you save the normals you have calculated so far - aka. flush the offline data - and somewhere else - possibly a cache file - you write your position.

We can probably help you further if you tell us more about your code/algorithm.

EDIT: Also, one more thing to look out for. You want to be able to incrementally save your output. Here, an obvious solution would be to have your grid of normals. The ones you haven't calculated yet would be assumed to be zero, and the ones you've already calculated have their values. Here, your default value doesn't matter as you can easily decide if you have already calculated a value because you have the state I mentioned earlier.

When you flush your data again, the only difference would be that you have less default values in your output.

Initializing the values to some default values is a good idea though, I think I'll look into that method.

The only problem is when I perform something like


void Grid::calculateActualCosts() {
for (auto& walkable : m_walkables) {
    for (auto& walkables : m_walkables) {
           AStarNode* fromNode = acquireNode(...);
           AStarNode* toNode = acquireNode(...);
           AStarNodePair pair(fromNode, toNode);
           ///
           astar(fromNode, toNode, totalCost);
           actualCosts.insert(std::make_pair(pair, totalCost));
 
    }
  }
}

Do I just put a totalCost like NaN into the file, and calculate the node pairs

that aren't initialized? but say tomorrow, I have to restart the whole loop again regardless.

How can I simplify this?

Update:

Do I put a conditional branch there, just say to read the file at that position to see if it has been initialized or not....

I think sooooooooo,

Update2:

I think I just sort the data set, and seek to the point where the data set is last saved.

Update3:

I've got a better idea. Let's put the whole process into a VM, and restart that from there on, How easy...

Thanks

Jack

You should also consider optimizing your algorithms.

As a simple example, you could use a flood fill algorithm to divide your map into regions, and store the region within each walkable node. You can then instantly reject all paths that don't start and end within the same region, instead of waiting for A* to fully explore the whole region each time before telling you that there is no path.

Before you make any optimizations like that you should of course use a profiler to tell you where in your code all the CPU time is going, so that you target your efforts in the right place.

Yes, the quickest and dirty-ish solution would be to use a conditional branch. Something like this:


void Grid::calculateActualCosts() {
	unsigned last_i = 0;
	unsigned last_j = 0;
	if(readingFromFile())
	{
		last_i = getLastI();
		last_j = getLastJ();
	}
	
	unsigned i = 0;
	unsigned j = 0;
	for (auto& walkable : m_walkables) {
		j = 0;
		if(i++ < last_i)
			continue; 
		
		for (auto& walkables : m_walkables) {
			if(j++ < last_j)
				continue; 
			
			AStarNode* fromNode = acquireNode(...);
			AStarNode* toNode = acquireNode(...);
			AStarNodePair pair(fromNode, toNode);
			///
			astar(fromNode, toNode, totalCost);
			actualCosts.insert(std::make_pair(pair, totalCost));	 
		}
	}
}

Or, you could keep an std::set<std::pair<walkable, walkable>>. If a pair of any two walkables is in the set, it has been processed already and you can just do a continue, like this:


for (auto& walkables : m_walkables) {
	if(progressSet.count({walkable, walkables}))
		continue; 
	
	AStarNode* fromNode = acquireNode(...);
	AStarNode* toNode = acquireNode(...);
	AStarNodePair pair(fromNode, toNode);
	///
	astar(fromNode, toNode, totalCost);
	actualCosts.insert(std::make_pair(pair, totalCost));	

        progressSet.insert({walkable, walkables});
}

At the beginning of the function you'd start with an empty set, and if you are continuing, read the set from a file.

Your update2 also sounds valid, if it is practical to sort your data.

At first, a VM sounds like an overkill, but if it's something you only run on your dev computer ( I assume so ), it is the fastest way to having a solution.

You should also consider optimizing your algorithms.

QFE.

I find it hard to believe that your program is actually doing 6 hours of work. Even Dwarf Fortress' notoriously involved world building process is a matter of minutes on a relatively modern machine.

It is often quite easy to find oneself in the position of having taken the straightforward solution to a particular problem, and finding it transformed from a O(N*N) problem to an O(x^N), or even worse.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Optimisation not withstanding, maybe it's just a process issue. Can't you automatically run your generation process automatically during the night?
optimize

Apart from that, you could of course simply let the computer on over night. Or you could go the same route that every IDE goes.

What happens if you build a project with 20,000 files in Visual Studio (or in Eclipse) and you abort the build after 1,500 files? Next time you tell it "build all", it will not build the 1,500 files for which object files exist that have a timestamp equal to the corresponding source file.

Surely, any task that takes several hours can be broken down into sub-tasks that take only a few minutes and that can be saved to disk as you go. Then just a final pass is needed assembling all the pieces together (just like the link stage). If the output of one step is needed for another, you can also restore to a workable state very quickly from saved intermediate results when starting the build process again the next day.

If a terrain file exists that has the same timestamp like your terrain creation parameters, you need not recreate that patch of terrain. If a "walkable" file that refers to this terrain exists, and it has the same timestamp as the terrain, you need not recalculate its path. etc etc.

This topic is closed to new replies.

Advertisement