• entries
    437
  • comments
    1000
  • views
    335620

Issue

Sign in to follow this  
evolutional

334 views

The current issue I'm trying to solve is whether the IDataEntitySet collections (eg: what you get back from a query) should hold pointers to entities or just the numeric Ids of the entities. The first is faster in terms of looking up the data (no handles to dereference) and is simpler, but the second method allows greater flexibility and concurrency of queries.

Specifically, if I hold pointers I have to find a way to ensure that the dataset remains valid even if an update query is called or otherwise something else which alters the data, such as a script. I'm thinking of perhaps having 'connected' entity sets wherein any changes to the underlying data is instantly reflected in the set - otherwise a disconnected set doesn't change (eg: a snapshot of results). The issue of pointers vs handles becomes important again here, if I hold pointers the disconnected sets could reference invalid data [1]. However if I provide handles, I'll need to provide an instance of my database class to each method/object to allow them to resolve the handle into real data.

Does anyone have any particular comments or ideas on this?



[1] Actually, the data will remain 'valid' as the pointers are stored in a shared_ptr, however the data itself could potentially not be registered within the database any longer and hence be considered invalid.
Sign in to follow this  


5 Comments


Recommended Comments

Pointers sound like they'll cause you (+users) a whole load of headaches all for a neat optimization. Doesn't sound like a good trade-off to me.

Also, you mention concurrency - it'd be nice to see this library at least be thread-safe, and ideally with some concurrent usage. Games are slowly getting used to the idea the multi-threading is the way forward - thus designing a *new* library should also at least be safe for this sort of usage...

Cheers,
Jack

Share this comment


Link to comment
Yes, I agree with Jack on both points.

Specific the pointer information, Effective C++ makes a point of saying you shouldnt really return pointers to internal data, as such a handle and query system would probably be better.

Also, keep in mind the 80-20 rule, this 'optermisation' probably wont save you anything in the long run.

Share this comment


Link to comment
I'll add my vote for a handle system. The KC language we built at Egosoft basically extends the handle concept very deeply - any data type is an integer. Integral data is obviously integers. Booleans, enums, and other such stuff are reduced to integers at compile time and basically "exist" only by means of preprocessor-generated syntactic sugar, similar to many other languages. Strings, arrays, and hash tables (the other data storage primitives in the language) exist via compiler-level syntactic sugar but decay to integer handles easily (which makes the implementation of the hash table code very, very easy, for obvious reasons). A class is defined as an integer code which is "registered" by means of syntactic sugar (are you tired of that phrase yet? [grin] ). The static "instance" of the class is accessed explicitly by (classID).foo, or classname.foo which reduces to the same thing after a preprocessor step. Dynamically allocated instances are referenced by integer handles, and again syntactic references are reduced to this integer system by language-level by preprocessor work.

It may seem like a weird system, and it definitely has occasional drawbacks (like making it difficult to integrate with Visual Studio's IntelliSense features) but the benefits are enormous:
  • It basically allows for free pass-by-reference semantics in all language features. (This is highly desirable for us but understandably a controversial language behavior in general.)

  • Reference storage is trivial to manage in 32-bit handles, and allows for trivial implementation of tools like hash tables and associative maps. Since any object has a handle, you can "key" anything you like off that handle trivially by referring to it by its handle.

  • Things like RTTI/reflection are trivial. An object's types can succinctly be described by merely looking at the list of class ID numbers that the object derives from. KC does not have multiple inheritance, but it could very easily because of this mechanism.

  • Integral handles make the development of a bytecode compiler much simpler. This may or may not be of any interest to your project, but it makes the actual opcode generation far more manageable. Specifically, it lets us easy offset management of the freestore to the engine, and compiled code only has to care about the stack and "system calls" that interact with the free store. There are no pointers or address concerns whatsoever in KC.

  • Referential integrity checks are trivial. KC has deterministic garbage collection; checking if an object has been garbage-collected is as easy as asking the engine if the object's ID is still "there." More importantly, if an invalid object handle is passed around someplace, the engine can immediately provide very helpful and precise information about the problem in debug output. For a data-heavy system like a game-logic script, this is invaluable.


The added layer of abstraction of KC's handle approach makes it extremely well-suited to high-level game logic. It also combats and helps eliminate a whole category of bugs, logic errors, and general coding mistakes. Of course there are other gotchas (linked mostly with the fact that we have poor compile-time function call checking, which makes it possible to pass wrong numbers of parameters, etc.) but these are easy enough to get used to, and with some extra reinforcement in the tools, can be eliminated entirely.


There's my two bits.

Share this comment


Link to comment
Firstly, thanks to all three of you - your input is very useful.

After weighing up the options and examining te current implementation of how queries are run, I can say that I'll go with the handle-based version. I don't like the idea of passing around a gamdeb::Database & reference to everything that needs to access the data (it'll overcomplicate the API, maybe), but it will make things 'better' all-round.

Bear in mind that this prototype version won't be concurrent, but I agree with Jack and Rob in saying that future versions should allow for multi-threaded architecture. Mainly, I can imagine something running a query in one thread and then working through the results while perhaps something tries to alter the data. I'll need to research how best ot implement the ACID features that most of us take for granted in modern databases.

Share this comment


Link to comment
There shouldn't be a demand to pass around the DB reference everywhere. Just separate the handle manager system from the DB storage mechanism, and retrieve concrete data from the handle manager directly:

class HandleManager

{
public:
HandleType AllocateFooObject(Database& container);
FooObject* GetFooByHandle(HandleType handle);

private:
// Have some sort of map that links handles to their container databases.
// When retrieving something by handle, talk to the appropriate container object directly.
};


Repeat pairing per data type, or better use templates to do the grunt work for you. You can also have the manager either return pointers or references, and return NULL or throw exceptions (respectively) in case an invalid handle is passed.


This is basically how the system is implemented in the KC runtime engine. A big advantage is that the handle system is not tightly coupled to the data storage containers; if you rebuild the data containers you only affect the implementation of the HandleManager class and nothing else. Another less dramatic advantage is that you have complete control over the allocation of handle pools; you can allocate handles globally across many containers, you can allocate handles in thread-local storage, or you can even allocate handles in short-term scopes.

This decoupling in general makes a potentially very rigid handle tracking mechanism become quite a bit more powerful than a mere reference mechanism.


(Afterthought: another incidental bonus, which really should have come to mind a lot sooner considering I benefit from it on a daily basis, is that a distinct handle manager makes it approximately one trillion times easier to bind your data containers to external/embedded languages.)

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now