Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 12 Jan 2007
Offline Last Active Mar 01 2014 11:19 AM

Topics I've Started

Seeking Advice for Scripting Objects with Data Oriented Design

17 February 2014 - 03:36 PM

Ordinarily, when performing game logic on instances of a generic Entity class, after spacial partitioning and assembling a list of nearby, relevant entities, I would simply toss the list of entities and the relevant entities into a script (usually lua) and do something like:

--Below: A wolf hunts for rabbits to eat



function UpdateEntities(entities_to_update,relevant_entities)

	for entity in entities_to_update do

		if GetEntityType(entity)==TYPES.WOLF then






function DoWolfAi(entity,relevant_entities)

	--Below: look for food

	local actions = {};

	for other in relevant_entities do

		if other~=entity then

			if GetEntityType(other)==TYPES.RABBIT then






	--Ommited: sort actions by member.priority

	--Ommited: pick top action (hunt) and do hunt function


It's pretty straightforward, and works just fine for a small number of instances. In fact, it's served me well for the admitedly hackish projects I've thrown together. But, I'm moving on to larger projects and running into issues of scalability. After getting a massive wakeup call to DoD (Data Oriented Design) and things like cache misses (which didn't matter so much when it was 50 or so objects with simple scripts, but really matter when it's 100k complex objects), I've come to realize that my old way of doing things is grossly insufficient.

The way this handles data seems attrocious, with what I know now. First there's all those "GetBlahBlah(entity)" calls, which usually refer back to an map (std::map or boost::container::flat_map) to fetch data from a generic Entity class. Each one is a lot of overhead, because it has to look up the map to find the Entity, then find Entity.BlahBlah, then pass it all the way back to the script. I shudder to think what's happening to the cache during all this.

That wont work for hundreds or worse thousands of objects. It's also hard to envision a model that works in a cache-friendly manner. Each "wolf" entity needs to fetch data from every nearby "rabbit" entity (namely its location and type), and thus it's dependent on other instances. I also wont know which wolf will need which rabbit when the wolf and rabbit are allocated, so even though flat_map has contiguous memory, I'm forced to use random access rather than sequential (read: cache friendly).

My current project has a bit of a challenge mixed in: It has to run decently on a specific machine with an Intel Atom N570 (read: crap). 4 cores, 1.42ghz, ancient technology. That means I have to dance carefully.

So, are there any examples of a DoD (Data Oriented Design) approach to the same issue? Every book, every paper, every article I've read uses something more complex but fundamentally similar to the above. From gamedev to gamasutra, very little seems to use Data Oriented Design. I could really use some reference material, or just informative discussion. Thank you.

Note: this forum (and only this forum) likes to erase whitspaces and newlines. Not sure why, and it seems unique to me. If it's illegible I appologize, I'll try to edit and fix it but the edits often times erase more...

[C++] Advice needed for implementing smarter and more efficient quadtrees

13 February 2014 - 09:23 PM

I'm implementing a quadtree for the spacial partitioning of a very large group of objects (target is 400k and trying to get it as fast as I can (target is no more than 50ms for all 400k). My initial naive approach was only able to handle about 1,000 objects in 27 seconds, but I've since been able to get it down to 10,000 objects in 113ms mostly by paying close attention to unnecessary move, copy, and temporary operations. I'm comfortably certain that there are no more such operations that can be eliminated.

The quadtree is a structure with two std::unordered_set: the first contains the id#'s of all objects inside the tree node aka "owned" objects, and the second contains the first plus any object close enough to the node's border to be included aka "known" objects. It also has a bounding rectangle defined by its top-left and bottom-right points (can be instantiated Rect(x1,y1,x2,y1) to avoid unecessary Point() constructors).

At every update, every known object is inserted into the quadtree. The quadtree is updated with quadtree::insert(const Point& where,const uint64_t& which).
--The method checks first if the quadtree has any children AND if it's depth is < a maximum AND if it has > max objecs within it. If that's true, it creates 4 children and dumps both of it's id containers into the children.
--The method then checks if it has any children - if so, call quadtree::insert for each child. If not, then check if the bounding rect contains "where"
----If the bounding rect contains "where" insert "which" into both containers
----If not, check if the bounding rect collides with a rect formed around where Rect(where.x-width/2,where.y-width/2,where.x+width/2,where.y+width/2). If so, add it to just the "known" container.

Unfortunately, after profiling 10 iterations of 10,000 randomly distributed objects, this leads to about ~408,000 calls to quadtree::insert (~37% execution time), 3,570,000 rect constructors (~6.6%), 7,256,000 point constructors (~10.38%), 3,586,000 stl emplacements (~13.21%).

The only really meaningful way to further optimize this that I can see would be to find a less naive way of splitting the quadtree, so that the top node needn't dump every "known" object into every child, which is what is dramatically multiplying the number of constructors. Any advice in this regard would be seriously appreciated.

Barely Annotated Code:

bool LogicTree::insert(const QuadPoint& where,const EntityID& which){

	bool within = false;

	//check if split needed

	if(children.size()==0 && depth < MAX_DEPTH && owned.size()>MAX_ENTITIES){

		Coord width = bounds.rightBottom.x-bounds.leftTop.x,height = bounds.rightBottom.y-bounds.leftTop.y;

		QuadRect c1(bounds.leftTop.x,bounds.leftTop.y,bounds.leftTop.x+width/2,bounds.leftTop.y+height/2),




		children.emplace_back(new LogicTree(c1,depth+1));

		children.emplace_back(new LogicTree(c2,depth+1));

		children.emplace_back(new LogicTree(c3,depth+1));

		children.emplace_back(new LogicTree(c4,depth+1));

		for(auto& child : children){




	//Pass to children or insert


		for(auto& child : children){


				within = true;







			within = true;

		}else if(bounds.intersects(QuadRect(where,MAX_RANGE))){




	return within;


P.S>If this comes out as a solid block of text, my appologies - for some reason this forum likes to delete my newline characters.

Edit: Attempt #1 to fix the whitespace nuke. Also fixed a mistake (400k not 3mil, lol).

[Lua] Seeking advice for multiple OS Threads, handling lua_State

07 February 2014 - 04:05 PM

I've hit a brick wall when it comes to an algorithm I'm working on. Data objects are distributed on a quad-tree, and each leaf node is batch processed by an OS thread from a threadpool. The actual processing happens in Lua, however, and that's where I run into serious problems. I understand that every OS Thread should have its own lua_State, but initialization of the Lua script is slow: it's a very large, very extensive script split between multiple files with plenty of C functions tied in - parsing and initializing all this again and again would be nightmarishly slow. Instead I'd rather initialize once, copy the read-made data into each thread's lua_State, and then run the processing. Unfortunately, Lua seems to define "thread" as something totally different than an OS thread, and functions like lua_newthread or lua_xmove are only intended for single-threaded applications. It also makes googling about this almost impossible: everything is about co-routines, which I can't see being helpful here. Any advice for moving forward would be greatly appreciated.

What algorithm would minimize repeat loading of every member of nxn grid+neighbors?

06 February 2014 - 08:27 PM


I'm loading a large batch of objects and I'd like to optimize a bit. The way the load code currently works, it loads each member of an nxn grid and its 8 neighbors (if they aren't loaded already) while also offloading any loaded object that isn't a neighbor.

Rather than brute-force with something like (for x in [0,n] do for y in [0, n] do "if not loaded, load index x+y*n and any not loaded neighbors"), I'd rather precompute a sequence of indices that minimizes the number of times any given tile must be loaded, then iterate across it. Since no object can be operated on without its neighbors having been loaded, I must iterate over every object.

What algorithm would produce such a sequence for any given nxn array? I suspect a quad tree would be useful, but there might be simpler ways that I haven't thought of.

n in the target case is 512. I'd prefer to assume that each object is very large, and therefore only a minimum of objects should be loaded. Objects are also slow to load/unload, which is why I'd prefer to minimize the "hits."

Thanks for any input!

[SQLite3] Major bottleneck at sqlite3_step, sqlite3_exec much faster - why?

06 February 2014 - 03:16 PM

Clearly I'm doing something incorrectly, because the overwhelming consensus is that sqlite3_prepare is considerably faster than sqlite3_exec, but I'm experiencing an extreme bottleneck at the sqlite3_step function.
int err;
std::string sqlstr = std::string("REPLACE INTO `Entities` VALUES(")+std::to_string(id)
err = sqlite3_exec(SQLINTERFACE.getDatabase(),sqlstr.c_str(),nullptr,nullptr,nullptr);
//...error checking code...
Runs about 50~100x faster than
sqlite3_stmt* statement = SQLINTERFACE.getStatement(Interface::ST_ENTITY_EXPORT);
	sqlite3* database = SQLINTERFACE.getDatabase();
err = sqlite3_bind_int64(statement,1,id);
genericSQLError(statement,database,err);//error checking code
err = sqlite3_bind_int64(statement,2,kind);
err = sqlite3_bind_int64(statement,3,nibble);
err = sqlite3_bind_text(statement,4,entityStr.c_str(),entityStr.size(),SQLITE_TRANSIENT);
err = sqlite3_step(statement);
I've copied it verbatim, so there's a few "blackbox" functons, but they aren't responsible for the slowdown. genericSQLError does not alter parameters, or anything referenced by them - it's a utility function for error checking. The code bottlenecks at sqlite3_step, despite sqlite3_exec recieving a directly equivalent statement. I'd appreciate any help finding a possible cause for the bottleneck. P.S> Yes, all of this is in a transaction.