Sign in to follow this  
cmac

Implementing an internally linked list via inheritance

Recommended Posts

I'm currently building a lightweight ECS system (keeping it hard-coded to the game for now, will look into separating into an engine once it's up and running). I'm taking some inspiration from Unity, as I have a lot of experience working with it, so I decided to keep track of components in an internally linked list, where the Transform component is the head. A GameObject is an empty container with a Transform component, and since Transforms can't (shouldn't) be removed, I figured it would be efficient to use it to keep track of the component linked list head.

However, instead of following the conventional data pattern of defining a Node class that contains a pointer to its data, I decided to try and integrate the linked pointers directly into the runtime node types through inheritance. Something tells me this is a bad practice, but it seems to satisfy data-oriented design philosophies, and I can't see any explicit pitfalls yet. There is a little lack of safety, but there are a couple asserts to ensure polymorphic compatibility. Also, having the nodes delete themselves without controlling their own allocation is definitely a bad smell, but is there really an issue if it's made explicit to the programmer to always dynamically allocate nodes that get pushed into the list?

Anyway here's the code, feel free to rip it apart and expose all of my bad C++ practices:

#ifndef _LINKED_NODE_H_
#define _LINKED_NODE_H_
#pragma once

#include <vector>
#include <typeinfo>


class LinkedNode
{
private:
	LinkedNode* prev = nullptr;
	LinkedNode* next = nullptr;

protected:
	virtual ~LinkedNode() { };

	void push_back(LinkedNode* node)
	{
		LinkedNode* curr = this;

		while (curr->next != nullptr)
		{
			curr = curr->next;
		}

		curr->next = node;
		curr->next->prev = curr;
		curr->next->next = nullptr;
	}


	// returns first found node of specified runtime type
	template <class DerivedType>
	LinkedNode* get_node_by_type()
	{
		LinkedNode* curr = this;

		while (this != nullptr)
		{
			if (typeid(curr) == typeid(DerivedType))
			{
				return curr;
			}
			
			curr = curr->next;
		}

		return nullptr;
	}


	// returns all nodes of specified runtime type
	template <class DerivedType>
	std::vector<LinkedNode*> get_nodes_by_type()
	{
		std::vector<LinkedNode*> nodes;

		LinkedNode* curr = this;

		while (curr != nullptr)
		{
			if (typeid(curr) == typeid(DerivedType))
			{
				nodes.push_back(curr);
			}

			curr = curr->next;
		}

		return nodes;
	}


	// removes this item from list
	void remove()
	{
		if (prev)
			prev->next = next;

		if (next)
			next->prev = prev;

		delete this; // IT'S DEFINED BEHAVIOUR JUST ALLOCATE IT DYNAMICALLY
	}


	// delete all nodes
	void clear_list()
	{
		LinkedNode* curr = this;
		
		while (curr->next != nullptr)
		{
			curr->next->remove();
		}

		delete this; // IT'S DEFINED BEHAVIOUR JUST ALLOCATE IT DYNAMICALLY
	}
};

#endif
 

And yeah, I'm not using smart pointers, I just want to get it functioning with raw pointers first then I'll work on ownership and safety.

The component class is just a container base that inherits from LinkedNode. The rest of its information should be arbitrary to this pattern so I won't include it.

The GameObject class then implements component functions like so:

#ifndef _GAME_OBJECT_H_
#define _GAME_OBJECT_H_
#pragma once

#include "Object.h"
#include "StandardTypes.h"
#include "LinkedNode.h"

class Component;
class Transform;

class GameObject final : public Object
{
public:
	GameObject();
	~GameObject();
	
	// All GOs need a transform, so it is the head of Component LinkedList
	Transform* transform;

	// Add existing component
	void AddComponent(Component* component);
	
	// Add new component by type
	template <class ComponentType> 
	void AddComponent()
	{
		bool isComponent = std::is_base_of<Component, ComponentType>::value;
		ASSERT(isComponent, "Trying to add a non-Component to a GameObject");

                // this won't work unless GameObject knows all complete component types
		transform->push_back(new ComponentType());
	}

	// Get first component of runtime type "type"
	Component* GetComponent(std::string type); // TODO: implement (RTTI class table)

	// Get first component of specified runtime type
	template <class ComponentType>
	ComponentType* GetComponent()
	{
		bool isComponent = std::is_base_of<Component, ComponentType>::value;
		ASSERT(isComponent, "Calling GetComponent<> with a non-Component type.");

		return static_cast<ComponentType*>(transform->get_node_by_type<ComponentType>());
	}

	// Get all components of specified runtime type
	template <class ComponentType>
	std::vector<ComponentType*> GetComponents()
	{
		bool isComponent = std::is_base_of<Component, ComponentType>::value;
		ASSERT(isComponent, "Calling GetComponent<> with a non-Component type.");

		// is there a better way to do this???
		// does this even work???
		std::vector<LinkedNode*> nodes = transform->get_nodes_by_type<Component>();

		std::vector<ComponentType*> components;
		components.reserve(nodes.size());

		for (int i = 0; i < components.size(); i++)
			components[i] = static_cast<ComponentType*>(nodes[i]);

		return components;
	}

};

#endif 

But doesn't static_cast<> require a complete type? Should dynamic_cast<> be used here? I was trying to be as lean as possible with this structure so I wanted to avoid inheritance tree searching, but I think it might be the only option excluding some sort of bonkers C++ witchcraft.

And it shouldn't be too relevant, but for completeness, GameObject.cpp:

#include "GameObject.h"
#include "Component.h"
#include "Transform.h"


GameObject::GameObject()
{
	transform = new Transform();
}


GameObject::~GameObject()
{
	transform->clear_list();
}


void GameObject::AddComponent(Component* component)
{
	transform->push_back(component);
}

Component* GameObject::GetComponent(std::string type)
{
	// TODO: implement (RTTI class table)
	return nullptr;
}

 

TLDR; is this approach a terrible idea that should be scrapped and replaced with a conventional LinkedList structure, or is it acceptable assuming it's used correctly? Also please pick apart any bad C++ practices I've used (except lack of smart pointers).

Edited by cmac

Share this post


Link to post
Share on other sites

Yeah, this makes more sense. I was referencing an engine I built in college that used a LinkedList and trying to optimize it, but thanks for the reminder to review the standard library containers before reinventing the wheel :p

Share this post


Link to post
Share on other sites

Also just use #pragma once instead of the include guards AND pragma, every compiler that you'll ever use supports it at this point and it's a lot less typing leading to fewer issues.

Share this post


Link to post
Share on other sites

Also just use #pragma once instead of the include guards AND pragma, every compiler that you'll ever use supports it at this point and it's a lot less typing leading to fewer issues.

 

I looked this up a couple months ago because I was curious about the differences, and read there are still some compilers (legacy perhaps) that don't recognize #pragma once, so it's safest to use both. But if it's the industry standard to just go with pragma once these days I'll hop on the bandwagon

Share this post


Link to post
Share on other sites
If you want to find more info, this kind of list is called an intrusive linked list.
Likewise a ref counting pointer like shared_ptr that stored ref counts inside the object would be called an intrusive ref counting system.

These used to be very common back when game engines cared about optimizing everything, but didn't bother optimizing for cache coherency.

Share this post


Link to post
Share on other sites

From Wikipedia, all major compilers support it (except some long-dead compilers that don't matter here).

I have read that there are some obscure custom compilers for things like DSPs and microcontrollers that don't support it, but even that is rare these days since most of them are based on GCC and other multiplatform compilers.  I've worked on some custom hardware and FPGA boards but they used GCC. Few companies are willing to spend the time to write a custom compiler when others are freely available and easily modified.

There are active official proposals to include #pragma once or a variant #once in the C++ standard. If they are accepted they would like use the #pragma once version because of the existing universal support.

And as for linked lists, they are almost always the wrong choice.  Several people (including Bjarne Stroustrup who designed C++) have called for them to be deprecated from the language standard.  There are still some hardware systems and some extremely rare (typically contrived) scenarios where they are a good solution, but it is almost never the right solution.  Memory speed and CPU speed have been diverging for decades, and the farther apart they get the worse linked lists become.

Edited by frob
Swap HTML italics for ip.board italics.

Share this post


Link to post
Share on other sites

Unity uses a tree, not a list. Nodes can have more than one child.

The transform hierarchy in Unity effectively creates a scene graph - the each transform is added to its parents. If this is the behavior you're looking for then it would be good to focus more on how to effectively implement this behavior rather than on this specific data structure. (Please specify if this is the case.)

There's no reason to use intrusive linkage. Prefer composition over inheritance.

Share this post


Link to post
Share on other sites

There's no reason to use intrusive linkage. Prefer composition over inheritance.


While I agree in general with the principle that composition is better than inheritance for most purposes, I want to clarify one thing: you can totally build an intrusive container/refcounter/etc. using just composition.

Share this post


Link to post
Share on other sites

A linked list is the wrong data structure to use here (a linked list is rarely the right data structure to use these days). [...] You're mainly concerned with adding/retrieving components by type id, right?. So an unordered_map seems to be the obvious choice. If you need to support multiple components of the same type on a game object, then use unordered_multimap.
Allow me to add that std::unordered_map (and even more so std::map) is also rarely the right data structure.

A std::unordered_map (empty, of any type) has an overhead of 56 bytes on my platform, which is already not that neglegible for something that's added to every node just like that.

But it gets worse: As easily observed by overloading global operator new, adding one element to an unordered_map typically does two dynamic allocations (for std::unordered_map<int,int> 16 and 24 bytes, respectively, on my platform). Adding two elements does three dynamic allocations (16, 24, 16 bytes). Adding three elements does 5 allocations (16, 24, 16, 16, 56 bytes). Adding 25 kv-pairs does 29 dynamic allocations. Adding 2,500 kv-pairs does 2,510 allocations. It's asymptotic for sure, but for small N it's a bummer of overhead just for managing a few key/value pairs.

On the other hand side, for large N, the fact that std::unordered_map has a cache coherency only rivalled in its badness by std::map, the guaranteed-cache-miss-every-time factor comes into play. So... it's neither really a stellar performer for small N, nor for large N.

I don't quite get the whole ECS stuff anyway (well, I do get it insofar as I see how it's nice and flexible, I just don't get how it can actually ever, in practice, work with an acceptable performance, but let's ignore whether or not I get it... that's irrelevant) but if I was to do such a thing, my first choice, until proven wrong, would certainly be a small_vector (such as clang uses internally), the expectation being that the average node does exactly zero extra allocations and that linearly iterating over 5-8 elements is just as fast as doing one hash table lookup.

But that's just me.

Share this post


Link to post
Share on other sites

you can totally build an intrusive container/refcounter/etc. using just composition

Fair enough. What I should have said was that an unintrusive container would be less hassle to maintain and more likely to be useful for reuse in the future.

I don't quite get the whole ECS stuff anyway

Easier to give power to scripters and to maintain (you can rewrite a component as a new version and then swap it out with the old version without shutting everything down). IIRC this was popularized by MMOs, where those kinds of things outweigh everything else.

Share this post


Link to post
Share on other sites

Unity uses a tree, not a list. Nodes can have more than one child.

The transform hierarchy in Unity effectively creates a scene graph - the each transform is added to its parents. If this is the behavior you're looking for then it would be good to focus more on how to effectively implement this behavior rather than on this specific data structure. (Please specify if this is the case.)

There's no reason to use intrusive linkage. Prefer composition over inheritance.

 

A scene graph of Transforms is absolutely something I want to get to, but it'll be a little while until that's in scope. I'll probably make a new topic if/when I get stuck on it.

 

 

A linked list is the wrong data structure to use here (a linked list is rarely the right data structure to use these days). [...] You're mainly concerned with adding/retrieving components by type id, right?. So an unordered_map seems to be the obvious choice. If you need to support multiple components of the same type on a game object, then use unordered_multimap.
Allow me to add that std::unordered_map (and even more so std::map) is also rarely the right data structure.

A std::unordered_map (empty, of any type) has an overhead of 56 bytes on my platform, which is already not that neglegible for something that's added to every node just like that.

But it gets worse: As easily observed by overloading global operator new, adding one element to an unordered_map typically does two dynamic allocations (for std::unordered_map<int,int> 16 and 24 bytes, respectively, on my platform). Adding two elements does three dynamic allocations (16, 24, 16 bytes). Adding three elements does 5 allocations (16, 24, 16, 16, 56 bytes). Adding 25 kv-pairs does 29 dynamic allocations. Adding 2,500 kv-pairs does 2,510 allocations. It's asymptotic for sure, but for small N it's a bummer of overhead just for managing a few key/value pairs.

On the other hand side, for large N, the fact that std::unordered_map has a cache coherency only rivalled in its badness by std::map, the guaranteed-cache-miss-every-time factor comes into play. So... it's neither really a stellar performer for small N, nor for large N.

I don't quite get the whole ECS stuff anyway (well, I do get it insofar as I see how it's nice and flexible, I just don't get how it can actually ever, in practice, work with an acceptable performance, but let's ignore whether or not I get it... that's irrelevant) but if I was to do such a thing, my first choice, until proven wrong, would certainly be a small_vector (such as clang uses internally), the expectation being that the average node does exactly zero extra allocations and that linearly iterating over 5-8 elements is just as fast as doing one hash table lookup.

But that's just me.

 

 

Didn't expect such solid programming insight from the guitarist of Emperor.

Really good points, and yes, there should be <10 components on nearly every GameObject, but it is supposed to be a somewhat robust system so there may be cases of 100+ components in a single container. It is fair to assume that most usages will never exceed 10 components in a single container though.

The main point that stuck out to me about hash mapping was that it uses the runtime types as the lookup, which bypasses type-checking in a linear iteration (maybe there's a better way to handle linear polymorphic iteration that I don't know about). Since components are generally looked up by their type, this seemed pretty efficient.

I guess now I'll have to run some tests on the speed and storage differences between linearly iterating a vector with type-checking vs. a map. If I find anything notable I'll post it here.

Share this post


Link to post
Share on other sites
I'm not a fan of overgeneralization, so this may not work for what you want to do, but I wouldn't tie "look up component by type" in with runtime type information.

Instead, I'd store multiple vectors, one for each type of component. An empty non-allocated vector is cheap enough, and you can group them logically into a tree of components and component-groups if necessary.

Then, "give me components of type X" becomes an O(1) operation with no reflection or runtime type data needed. You just write a function that returns the appropriate vector.



Like I said, this is not purist ECS and it probably will aggravate folks that are insistent on that purity. But it's food for thought.

Share this post


Link to post
Share on other sites

I don't quite get the whole ECS stuff anyway

Easier to give power to scripters and to maintain (you can rewrite a component as a new version and then swap it out with the old version without shutting everything down). IIRC this was popularized by MMOs, where those kinds of things outweigh everything else.
You get those same ability with regular, plain, no framework required, composition.

The typical "ECS framework", which honestly is a complete and utter anti-pattern, was popularised by a programmer froma first person shooter who declared it the future of MMO design on his unfortunately influential blog series.

maybe there's a better way to handle linear polymorphic iteration that I don't know about

The answer is to unask that question and don't do that.
If you want performance, you want data locality and code locality. That means not using polymorphism, or only using it at a higher level, such as a whole system of components instead of a single component (i.e. the S in ECS framework).
To paraphrase Acton, multiple objects is the general case. When was the last time you wrote a compoment that was only instantiated once? One object is a corner case. Write code for the general case, which means writing batch processing functions.

Share this post


Link to post
Share on other sites

IIRC this was popularized by MMOs, where those kinds of things outweigh everything else.

Nope. It was someone who claimed it would be the future of MMOs. I've worked on MMOs for most of the 10 years since that blog post was written and I've not seen or heard of a single one use a component-based architecture, never mind the extreme component/entity/system separation that blog talks about.

(Oh, just seen that Hodgman mentioned this already. Oh well.)

I'd strongly advise anybody who wants to break their game entities into components to really think about what benefit they're hoping to get from it, and plan that up-front rather than trying to follow any abstract rules they may have read. The existing literature on component-based games systems is mostly evangelism and very little practicality, often with conflicting aims or goals that are not relevant to new developers.

Share this post


Link to post
Share on other sites

You get those same ability with regular, plain, no framework required, composition.

Isn't ECS just composition ad absurdum? It seems like every time I talk to people about ECS the boundaries of its definition are different. -.- My impression was that ECS is a system where new types of component can be created and injected, and entities are just clumps of various components constructed at runtime by data.
 

Nope. It was someone who claimed it would be the future of MMOs. I've worked on MMOs for most of the 10 years since that blog post was written and I've not seen or heard of a single one use a component-based architecture, never mind the extreme component/entity/system separation that blog talks about.

Nice. I'll extend my procrastination about messing with it.

Share this post


Link to post
Share on other sites

You get those same ability with regular, plain, no framework required, composition.

Isn't ECS just composition ad absurdum? It seems like every time I talk to people about ECS the boundaries of its definition are different. -.- My impression was that ECS is a system where new types of component can be created and injected, and entities are just clumps of various components constructed at runtime by data.


Sort of. As far as I can tell, at least some ECS is people looking for a silver bullet encountering some data-oriented design principles, misinterpreting them, and turning their misinterpretation into dogma. As you say, though, the definition depends on who you're talking to.

Share this post


Link to post
Share on other sites

Composition is a feature of almost every language and the core pillar of OO (and ecs, with limits...)

An "Entity/Component/System framework" (abbreviated as ECS) means:
Entities are nothing, but can compose components into themselves.
Components are data structures that are banned from using composition themselves, but they may link to entities.
Systems contain logic that operates on specific components.

Note that this is different to an "Entity/Component framework" (often called an "Entity/Component system" as system/framework are interchangeable)... So if you're building an Entity/Component system, that is a different thing to building an Entity/Component/System system!
These is a much more vague term, but is usually the same thing but with Components and Systems being rolled together, and overall less dogma. There's usually still the same restriction that entities may make use of composition while components are banned.

Often EC-frameworks are made to mesh well with OOP, while ECS-frameworks are closer to relational database systems or procedural programming.

Common features of both kinds of framework include:
* the component members of an entity being identified only by their type, instead of name as in OOP.
* often a restriction that an entity cannot contain two instances of a single component type.
* the ability to find components by type, and to query the parent entity of a component to find sibling components by type.
* a data driven template/prototype system, that allows entity "classes" to be created from data files.
 

As a fun straw-man exercise... :D
To an OO programmer, adding ECS dogma to your coding style while remaining on OO land would be similar to this:
* Classes must be commented as an Entity type or Component type.
* Classes labelled as Component can only contain primitive types, or pointers to an Entity type class.
* Classes labelled as Entity can only contain pointers to Component type classes.
* Classes labelled as Entity must name their members the same as those member types - and as a consequence, there cannot be two members of the same type.
* No class may contain member functions or any logic.
e.g.

struct Health { float hit_points; }; // component
struct PoisonSpell { Player* target; float time_left; }; // component
class Player // entity
{
  Health* _Health;
  PoisonSpell* _PoisonSpell;
};
void PoisonSystem( float dt, Player* begin, Player* end )//system
{
  for( Player* p = begin; p != end; ++p )
  {
    PoisonSpell* s = p->_PoisonSpell;
    if( !s )
      continue;
    s->time_left -= dt;
    if( s->time_left <= 0 || !s->target )
    {
      p->_PoisonSpell = 0;
      delete s;
    }
    Health* h = s->target->_Health;
    if( h )
      h->hit_points -= dt;
  }
}

Sure that's a valid style that you can choose to program in, and by default in C++ it's not at all dynamic (my entity definition above is static code)... so if you like this style you would either have to build a big ECS framework to actually allow dynamic entity descriptions, or just use another language like Lua where you could write in this style natively without the need to build a framework at all! Personally, I'm a fan of doing the latter -- write code that's idiomatic to he language and forgo large frameworks of any kind.

Edited by Hodgman

Share this post


Link to post
Share on other sites

I have read that there are some obscure custom compilers for things like DSPs and microcontrollers that don't support it

That isn't the real problem with them. The main problem is that they just don't work the way most people think they do, and it's fairly trivial to break them in larger build systems.

And part of _that_ problem is that pragma once is not well-defined. Namely, how does the compiler decide that a header has already been included? Is it the file's inode? Relative path name? Absolute path name (including symlinks? multiple mount points? etc.) ? What about a file that is accessed through two separate paths that go through two separate virtual filesystems (like an NFS mount to the localhost and the direct local path? yes, there are systems setup like this, and can be relevant to games on Linux-based CI systems). This also comes up on some weirder systems (not relevant to games) that don't have directory structures similar to the POSIX/Win32 model.

Basically, it's impossible to make pragma once work reliably the way people expect. There simple isn't a way to uniquely identify a file across all possible file systems, platforms, and weird things some platforms can do with reparse or mount points or network filesystems and so on.

The compiler differences do come up. For instance, porting code between GCC and MSVC when using precompiled headers on both can be a little tricky specifically because of the different rules each have surrounding #pragma once. Those differing rules also make it impossible to standardize, since if it were then one or the other would be required to change behavior and break existing code.

There are active official proposals to include #pragma once or a variant #once in the C++ standard. If they are accepted they would like use the #pragma once version because of the existing universal support.
 

#pragma once will never be standardized since it is inherently broken as described above.

The #once proposal is basically just a syntactic shortcut to include guards: the #once syntax requires an additional token (just like an include guard) to identify the file which is intended to be unique (though it could be used to ensure that only one of a particular set of files is included by sharing the identifier between them). "#once foo" essentially just wraps the file contents in "#ifndef FOO #define FOO #endif".

And even that's super unlikely, because Modules will kind of make traditional headers obsolete anyway.

 

... all that said, I just use #pragma once myself. :p

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this