RTS AI that does not slow the game down to unplayable points?

Started by
24 comments, last by Kylotan 1 year, 2 months ago

So I am programming a RTS with DirectX and I have managed to get all the troops, buildings, FX, and terrain on the screen and still never drop below 50 FPS. However, when I start running the AI code, the game drops to 20 FPS, and then eventually to 13 FPS.

Upon checking the CPU Usage data I have determined that my A* path function is creating an issue:

std::vector<INTPOINT> TERRAIN::GetPath(INTPOINT start, INTPOINT goal)
{
	try
	{
		//Check that the two points are within the bounds of the map
		MAPTILE *startTile = GetTile(start);
		MAPTILE *goalTile = GetTile(goal);

		if (!Within(start) || !Within(goal) || start == goal || startTile == NULL || goalTile == NULL)
			return std::vector<INTPOINT>();

		//Check if a path exists
		if (!startTile->m_walkable || !goalTile->m_walkable || startTile->m_set != goalTile->m_set)
			return std::vector<INTPOINT>();

		//Init Search
		long numTiles = m_size.x * m_size.y;
		for (long l = 0; l < numTiles; l++)
		{
			m_pMapTiles[l].f = m_pMapTiles[l].g = FLT_MAX;		    //Clear F,G
			m_pMapTiles[l].open = m_pMapTiles[l].closed = false;	//Reset Open and Closed
		}

		std::vector<MAPTILE*> open;				//Create Our Open list
		startTile->g = 0;						//Init our starting point (SP)
		startTile->f = H(start, goal);
		startTile->open = true;
		open.push_back(startTile);				//Add SP to the Open list

		bool found = false;					 // Search as long as a path hasnt been found,
		while (!found && !open.empty())		 // or there is no more tiles to search
		{
			MAPTILE * best = open[0];        // Find the best tile (i.e. the lowest F value)
			int bestPlace = 0;
			for (int i = 1; i < (int)open.size(); i++)
				if (open[i]->f < best->f)
				{
					best = open[i];
					bestPlace = i;
				}

			if (best == NULL)break;			//No path found

			open[bestPlace]->open = false;

			// Take the best node out of the Open list
			open.erase(open.begin() + bestPlace);

			if (best->m_mappos == goal)		//If the goal has been found
			{
				std::vector<INTPOINT> p, p2;
				MAPTILE *point = best;

				while (point->m_mappos != start)	// Generate path
				{
					p.push_back(point->m_mappos);
					point = point->m_pParent;
				}

				for (int i = (int)p.size() - 1; i != 0; i--)	// Reverse path
					p2.push_back(p[i]);
				p2.push_back(goal);
				return p2;
			}
			else
			{
				for (int i = 0; i < 8; i++)					// otherwise, check the neighbors of the
					if (best->m_pNeighbors[i] != NULL)	// best tile
					{
						bool inList = false;		    // Generate new G and F value
						float newG = best->g + 1.0f;
						float d = H(best->m_mappos, best->m_pNeighbors[i]->m_mappos);
						float newF = newG + H(best->m_pNeighbors[i]->m_mappos, goal) + best->m_pNeighbors[i]->m_cost * 5.0f * d;

						if (best->m_pNeighbors[i]->open || best->m_pNeighbors[i]->closed)
						{
							if (newF < best->m_pNeighbors[i]->f)	    // If the new F value is lower
							{
								best->m_pNeighbors[i]->g = newG;	// update the values of this tile
								best->m_pNeighbors[i]->f = newF;
								best->m_pNeighbors[i]->m_pParent = best;
							}
							inList = true;
						}

						if (!inList)			// If the neighbor tile isn't in the Open or Closed list
						{
							best->m_pNeighbors[i]->f = newF;		//Set the values
							best->m_pNeighbors[i]->g = newG;
							best->m_pNeighbors[i]->m_pParent = best;
							best->m_pNeighbors[i]->open = true;
							open.push_back(best->m_pNeighbors[i]);	//Add it to the open list	
						}
					}

				best->closed = true;		//The best tile has now been searched, add it to the Closed list
			}
		}

		return std::vector<INTPOINT>();		//No path found, return an empty path

	}
	catch (...)
	{
		debug.Print("Error in TERRAIN::GetPath()");
		return std::vector<INTPOINT>();
	}
}

How were games like Empire Earth and Age of Empires able to make calculations like this one without the tax on the computer that I am suffering? Only around 5 units are making this calculation, while Age and Empire Earth are capable of doing hundreds at the time without a significant drop in frame rate.

What could be the issue here?

-rydog

Edit: Sorry about the crappy title name. I did not double check it before I clicked “post," and it does not appear that I am able to change it.

Advertisement

Don't know how those games did it, but 5 units shouldn't be much of a problem.

Looking at your code, you don't seem to take speed much into account. You are making lots of result vectors, searching a linear list, and storing path-finding data in the map.

Creating and discarding many result vectors puts a lot of stress on the memory manager, try to avoid creating new vectors all the time. For example, if you let the caller supply a vector, and it saves it in its own data, you can re-use the same vector for each call instead of creating a new “p” and “p2” on each call. You can completely eliminate the final “p2” vector if the caller understands that the vector is in reverse order. This also eliminates the copying operations of course.

A second point here is, as you have a noticable impact on performance, what is the caller actually doing with the path that you compute? Does it use all information, or picks it just the first entry and does the vector get discarded then?

----

About using the map for temporary storage, modern CPUs are fast due to their CPU caches (L1 to L3 caches). They assume locality of data access. That is, they work best if the data that you access is in a small section of memory, and preferably access only forwards or backwards. In particular, jumping around in a large amount of memory, like your tile map, sort-of randomly accessing single tiles is deadly for performance.

Instead of storing path-finding data in the map, store it locally in the algorithm. open and closed lists are small compared to all tiles in the map, so it improves locality. It also somewhat avoids loading map data into the cpu caches that is not relevant to path finding (eg if you have a "height" in it, that is now loaded as part of accessing a map tile, but as you don't need it here, it's wasted time and space).

In other words, have an open and closed list here, treat the map as read-only for path-finding. “F" and “G” can easily be stored in those lists (although iirc you don't actually need to store both values, please check how these values are used). The “open” and “closed” booleans in the map can be converted to a bitset (an extremely efficient form of vector<bool>).

For the memory manager, instead of a GetPath function make it a method of a class and keep internal path-finding data data such as open and closed lists available between calls (thus avoiding constantly creating new vectors for each call). “open.clear()” is much faster than making a new vector each call.

----

			for (int i = 1; i < (int)open.size(); i++)
				if (open[i]->f < best->f)
				{
					best = open[i];
					bestPlace = i;
				}

In general, don't search in lists that may be longer than a handful of entries. Linear searching doesn't scale well as the list size grows. There are containers such as std::map that can store their data such that this is a single access to the std::map, and done. In this case sort the std::map on increasing “f”, and the best place will be the first value in the std::map.

EDIT: Clarified “map” values to “std::map” in the last sentence.

		while (!found && !open.empty())		 // or there is no more tiles to search
		{
			MAPTILE * best = open[0];        // Find the best tile (i.e. the lowest F value)
			int bestPlace = 0;
			for (int i = 1; i < (int)open.size(); i++)
				if (open[i]->f < best->f)
				{
					best = open[i];
					bestPlace = i;
				}

This looks like the main issue. The longer the algorithm runs, the longer the search it needs to do in each step.
So you want to eliminate the search. I've tried two options for this, using either std::priority_queue or std::set. Neither is optimal for the task, but the former turned out faster. (Did not try std::map as suggested by Alberth)

I use this code for preprocessing tools, and it surely is no good reference, but maybe better than nothing:

	class PathGrower
	{
		std::vector<float> mPathMinDistance;
		std::vector<int> mPathPreviousEdge;
		std::vector<bool> mExtracted;
		
		const std::vector<int> *mPrimAdjList;
		const std::vector< std::array<int,2> > *mPrimAdjacency;
		const std::vector< std::array<int,2> > *mEdgeAdjacency;
		const std::vector<float> *mEdgeWeights;


		std::priority_queue< std::pair<float,int>, std::vector< std::pair<float,int> > > queue;


	public:
		void Init (
			const int sourceIndex, 
			const std::vector<float> &edgeWeights, 
			const std::vector<int> &primAdjList,
			const std::vector< std::array<int,2> > &primAdjacency,
			const std::vector< std::array<int,2> > &edgeAdjacency,
			const int primitiveCount)
		{
			mPrimAdjList = &primAdjList;
			mPrimAdjacency = &primAdjacency;
			mEdgeAdjacency = &edgeAdjacency;
			mEdgeWeights = &edgeWeights;

			int n = primitiveCount;
		
			mPathPreviousEdge.resize(n);
			std::fill (mPathPreviousEdge.begin(), mPathPreviousEdge.end(), -1);
		
			mPathMinDistance.resize(n);
			std::fill (mPathMinDistance.begin(), mPathMinDistance.end(), FLT_MAX);
			
			mExtracted.resize(n);
			std::fill (mExtracted.begin(), mExtracted.end(), false);

			queue = {};

			if (sourceIndex >= 0)
			{
				mPathMinDistance[sourceIndex] = 0;
				queue.push (std::make_pair(0.0f, sourceIndex));
			}
		}

		void Init (const int sourceIndex, const std::vector<float> &edgeWeights, const MeshConnectivity &conn, const bool useVertices)
		{
			Init (sourceIndex, 
				edgeWeights, 
				conn.GetPrimitiveAdjListIndex(useVertices),
				conn.GetPrimitiveAdjacency(useVertices),
				conn.GetEdgeAdjacency(useVertices),
				conn.GetPrimitiveCount(useVertices));
		}

		void Reset (const std::vector<int> visitedPrimitives)
		{
			for (;;)
			{
				if (queue.empty()) break;
				int c = queue.top().second;
				queue.pop();
				mPathPreviousEdge[c] = -1;
				mExtracted[c] = false;
				mPathMinDistance[c] = FLT_MAX;
			}

			for (int i=0; i<visitedPrimitives.size(); i++)
			{
				int c = visitedPrimitives[i];
				mPathPreviousEdge[c] = -1;
				mExtracted[c] = false;
				mPathMinDistance[c] = FLT_MAX;
			}

		}

		void SetSource (const int sourceIndex)
		{
			mPathMinDistance[sourceIndex] = 0;
			queue.push (std::make_pair(0.0f, sourceIndex));
		}

		int Next () // Dijkstra
		{
			int c = -1;
			for (;;)
			{
				if (queue.empty()) return -1;
				c = queue.top().second;
				queue.pop();
				if (mExtracted[c]) continue; // prevent duplicates + uniformSpeed up
				mExtracted[c] = true;
				break;
			}
			
			int begin = (*mPrimAdjList)[c];
			int end = (*mPrimAdjList)[c+1];
			for (int i=begin; i<end; i++)
			{
				int n = (*mPrimAdjacency)[i][1];
				int e = (*mPrimAdjacency)[i][0];
				float w = (*mEdgeWeights)[e];

				if (!mExtracted[n] && mPathMinDistance[n] > mPathMinDistance[c] + w)
				{
					mPathMinDistance[n] = mPathMinDistance[c] + w;
					queue.push(std::make_pair(-mPathMinDistance[n], n));
					mPathPreviousEdge[n] = e;
				}
			}
			return c;		
		}



		int GetPrevEdge (int index) const
		{
			return mPathPreviousEdge[index];
		}

		int GetPrevPrimitive (int index) const
		{
			int eI = mPathPreviousEdge[index];
			if (eI==-1) return -1;
			return ((*mEdgeAdjacency)[eI][0] == index 
				? (*mEdgeAdjacency)[eI][1]
				: (*mEdgeAdjacency)[eI][0]);
		}

		void GetPrevEdgeAndPrimitive (int &eI, int &pI, int index) const
		{
			eI = mPathPreviousEdge[index];
			if (eI==-1) pI = -1;
			else pI = ((*mEdgeAdjacency)[eI][0] == index 
				? (*mEdgeAdjacency)[eI][1]
				: (*mEdgeAdjacency)[eI][0]);
		}

		float GetPathLength (int index) const
		{
			return mPathMinDistance[index];
		}

		void SetPrimitivesHidden (const bool hide)
		{
			std::fill(mExtracted.begin(), mExtracted.end(), hide);
		}
		void SetPrimitiveHidden (const int index, const bool hide)
		{
			mExtracted[index] = hide;
		}
		bool HasPrimitiveProcessed (const int index)
		{
			return mExtracted[index];
		}
		std::vector<bool>& GetProcessedFlags ()
		{
			return mExtracted;
		}
	};

Here some code to test it, finding the closest path between two vertices on a mesh:

			static int init = 1;
			static HEMesh hm;
			static MeshConnectivity conn;
			if (init) 
			{
				((HEMesh_Serialization&)hm).AutoLoadModel ("C:\\dev\\pengII\\mod\\bunny closed.lxo", 1, 1, 1, 10, 1);
				conn.Setup (hm, 1,1);
			}
			init = 0;

			static int source = 23;//1;
			ImGui::DragInt("source", &source, 0.01f);
			static int goal = 31;
			ImGui::DragInt("goal", &goal, 0.01f);
			
			RenderCircle(0.3f, hm.GetVertexPos(source), hm.CalcVertexNormal(source), 1,0,0);
			RenderCircle(0.3f, hm.GetVertexPos(goal), hm.CalcVertexNormal(goal), 0,1,0);
			
			std::vector<float> edgeLengths;
			edgeLengths.resize(conn.GetEdgeCount());
			for (int i=0; i<conn.GetEdgeCount(); i++)
			{
				edgeLengths[i] = vec(
					hm.GetVertexPos(conn.GetEdgeAdjacency(true)[i][0]) - 
					hm.GetVertexPos(conn.GetEdgeAdjacency(true)[i][1])).Length();
			}

			PathGrower grower;
			grower.Init (source, edgeLengths, conn, true);

			for (;;)
			{
				int curVertex = grower.Next();
				if (curVertex == -1) break;
				
				vec p0 = hm.GetVertexPos(curVertex);
				RenderPoint(p0, 0.5f,0,0);

				if (curVertex == goal) // vis path from goal and done
				{
					for (;;)
					{
						int prevVertex = grower.GetPrevPrimitive(curVertex);
						if (prevVertex == -1) break;
						
						vec p1 = hm.GetVertexPos(prevVertex);
						RenderArrow(p0, (p1-p0)*0.5f, 0.05f, 1,0.5f,0.5f);
						curVertex = prevVertex;
						p0 = p1;
					}
					break;
				}
				else // is segment we have just added to the paths
				{
					int prevVertex = grower.GetPrevPrimitive(curVertex);
					if (prevVertex != -1)
					{
						vec p1 = hm.GetVertexPos(prevVertex);
						RenderArrow(p0, (p1-p0)*0.5f, 0.05f, 0.5f,0,0);
					}
				}
			}

After the PathGrower is initialized to the mesh, i could use it to find many paths not requiring any more dynamic allocations (except the queue).
The path data remains local in the PathGrower, and i could access it and construct the path later using the helper functions like GetPrevEdge.
So there is no need to construct a std::vector for a path, but usage becomes more involved and complicated.

Notice however my code is Dijkstra, not A-Star. And it's not meant for realtime. But maybe it gives some inspiration.

EDIT:

Ofc. there are other ways of optimization, like time slicing the work. Instead computing the paths for 50 units every frame, update the path for just N units per frame.

@alberth

With all due respect, I tried the std::map with my terrain, but there are jsut too many entries for it to be something that is doable. There are like 40 different maptiles that are generated with each creation of the map. I mean, in theory, I could do it, but it would be very time-consuming, and acould also lead to errors.

Unless there is some type of way to just add in the way to treat the map like a vector. But perhaps I am misunderstanding you.

I really do appreciate the explanation you have given me, and I have implemented various parts of your reccomendations, which have, by themselves, lead to the frame rate increasing.

yaboiryan said:
With all due respect, I tried the std::map with my terrain

I think we have a misunderstanding. I was saying to use a std::map for the open list, so you can sort it on F. A list of tiles for storing your map is fine, it's a compact data structure and easy to access on tile coordinate.

What I was suggesting is something along the following.

class OpenPosition {
public:
    float F;
    int mapIndex;

    OpenPosition(float pF, int pmapIndex) : F(pF), mapIndex(pmapIndex) { }
};

New class for an open position. It may need F (I don't need it here, but I didn't write a full A* algorithm), and probably other information like the map position (I took the index, but (x, y) pair would work too of course).

int main() {
    std::multimap<float, OpenPosition> open;

My open list. It sorts on increasing “float” values by itself.

For demo, I just created 3 open positions out of the blue and stuck them into the open list. Normally this would be done in your A* code.

 // Assume these values become available in the A* computation.
    OpenPosition v1(20, 77);
    OpenPosition v2(30, 88);
    OpenPosition v3(20, 99);

    // Add them all to the open list.
    open.insert(std::pair<float, OpenPosition>(v1.F, v1));
    open.insert(std::pair<float, OpenPosition>(v2.F, v2));
    open.insert(std::pair<float, OpenPosition>(v3.F, v3));

Now get an entry with the smallest F:

// Now find an entry with the lowest F.
    std::map<float, OpenPosition>::iterator entry = open.begin();
    if (entry != open.end()) {
        printf("Entry in open list with smallest F:\n");
        // Use it.
        printf("- F=%f at map index %d\n", entry->second.F, entry->second.mapIndex);
        open.erase(entry); // And aftewards delete it from the open list.

    } else {
        // We ran out of open positions.
    }

    // end of main()
    return 0;
}

As you can see, getting an entry from the open list with the smallest F is the first value you get. So just check there actually is an entry, and done. No need to look through all open positions on every tile that you expand.

EDIT: My “aftewards” may not work, I don't know if “entry” stays valid if you insert new open positions into the open list before erasing the selected entry. For safety I recommend that you first make a copy of the found open position, erase it from the open list, and then proceed with the algorithm.

Output:

Entry in open list with smallest F:
- F=20.000000 at map index 77

@JoeJ is doing the above with std::priority_queue and std::set. I haven't run speed tests what is faster, but any of the solutions obliterates your “for-all items in the open list, examine it” loop in performance.

Always pass an std::vector<>& or std::vector<>* into a function. What your code does is locally creates a vector with a bunch of points, then when it returns, it has to copy that entire vector over to the return value. If you pass it in as a reference, or pointer, then you are just operating on one single vector and manipulating it as you go. There is no extra copy into the return.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

@yaboiryan As others already mentioned, the std::vector is one thing that kills your performance, passing it by value kills even more. But there are still more!
The priority queue proved to be very effective to me. I don't know the std implementation, I made my own that laid out the tree in a linear memory block. Using vector, especially the earse() part is very time consuming.

Also the part “Init Search” should be completely eliminated, even with a medium sized map (256x256) this part can be much slower than the whole search itself.

yaboiryan said:
float newG = best->g + 1.0f;

I'd use different cost for diagonal directions, 1.4 is a good guess. (Search for Manhattan Distance). The result will be nicer and consisting less result points.

You are making 2 point lists for the result. That is also not necessary. Either you can reverse it in-place, or directly fill it into the result buffer.

Open and closed tile flags can also go. As you maintain an open-list, there is no need to store this info multiple times ?

How big a map are we talking about here? In a map that doesn't look like a maze (which is typical for a game like Age of Empires), an efficient implementation of A* should be very very fast. I can provide some short program to demonstrate this, if you think it would help you.

Typical example of using OOP code, when you dont know what the miracle library does and why it takes 300 million cpu cycles instead of 300. I suggest to scrap this nonsense for normal C code.

alvaro said:
I can provide some short program to demonstrate this, if you think it would help you.

Thread is almost a year old, but in case you're bored, i would take a look at your program. I've missed out on A* so far.

Geri said:
when you dont know what the miracle library does and why it takes 300 million cpu cycles instead of 300. I suggest to scrap this nonsense for normal C code.

There is no miracle here. The search i have quoted is the problem, because it can be avoided. So the algorithm was bad, not the language.

But i agree STL causes many suboptimal implementations. Not sure, but i think we would need something like a ‘Fibonacci Heap’, if that's the proper word. STL does not have this, and i guess i'm not the only one being too lazy to make one, using workarounds instead.

This topic is closed to new replies.

Advertisement