ECS with an event messaging system

Started by
20 comments, last by DemonDar 7 years, 2 months ago

Hi everyone,

There's an issue that's been bugging me for a while now and I'd like to resolve it before I move on with other things. I've been working on my own little C++/SDL/OpenGL/ECS game framework/engine for a year or so now and it's coming along nicely (and thanks to anyone here who's helped me in the past).

BUT I'm struggling to realise the best way to implement event messaging system correctly with ECS. The way I originally implemented it

- The messaging system sends all messages at the beginning of the step

- Each system only receives the event types it has subscribed to

- Each system will read its mailbox immediately and then do everything (e.g. find collisions, render, etc..) its designed to do

- Finally, each system sends messages to the messaging system at the end of the step.

So I realised that there might be some cases where an event only gets processed a frame or two later, but was happy to accept this at the beginning. Except now I'm getting quite complicated event-chains. Take the following example

- User presses key (or touch pad or other input). Key press event sent to command system (e.g. KEY_EVENT, 'UP_ARROW')

- Command system finds matching command to input and generates command (e.g. MOVE_EVENT, 'PLAYER_MOVE_UP')

- Physics system, or player script, processes command and sends command to actually move entity (e.g. MOVE_ENTITY, 'UP')

- Rendering system updates position of entity and renders entity at new position

So, in this case there are 4 frames before the player will actually see the entity move. This obviously means a noticeable delay depending on the framerate (e.g. about ~0.07 sec for 60fps, ~0.13sec for 30 fps). Now I know about the whole 'fix-your-timestep' stuff, but that's not the issue really. Unless I use a crazy high game timestep, this can give issues with responsiveness no matter what

So there are several ideas that come to mind :

- Have an immediate event dispatch mode, where systems can send events and even receive replies in the same timestep. This would help but would surely cause problems with concurrency of the (otherwise independent) systems. Plus I'm not sure how I would implement this anyway.

- Allow systems to ask a limited set of queries of other systems directly. For example, a touch event (on a mobile) might enquire to the collision system if there is a selectable object at the touch coordinates and return either a valid entity id or nothing. The main problem is this causes a little inter-system dependency, something the ECS with event system tries to avoid. Second, each system might be updating that data when the query is made; this can be solved by double-buffering the data but might be expensive (memorywise) for each system. But then this would allow safe queries to be made of other systems provided it doesn't change any data (which it shouldn't).

Anyway, I don't want to overthink it too much but I surely need some solution since my original 'one send-one receive-per step' approach is surely not good enough. If anyone has encountered similar problems, or has experience with ECS systems and realises I am doing things completely wrong, then I would love some advice :-) . Thanks!

Advertisement
- User presses key (or touch pad or other input). Key press event sent to command system (e.g. KEY_EVENT, 'UP_ARROW') - Command system finds matching command to input and generates command (e.g. MOVE_EVENT, 'PLAYER_MOVE_UP') - Physics system, or player script, processes command and sends command to actually move entity (e.g. MOVE_ENTITY, 'UP') - Rendering system updates position of entity and renders entity at new position

The order in which your systems run is generally very important, so you should be deliberate about it.

If your InputSystem runs first, then your PhysicsSystem, then your RenderSystem, everything should happen in the same frame.

or example, a touch event (on a mobile) might enquire to the collision system if there is a selectable object at the touch coordinates and return either a valid entity id or nothing. The main problem is this causes a little inter-system dependency, something the ECS with event system tries to avoid

I don't think is necessarily a bad thing. Even with your event system you still have that dependency - it's just a runtime dependency instead of a compile-time one (which could be legitimate if you might have multiple systems responding to the question "What is at this location?" - but do you?).

You can use dependency injection to mitigate some of the concerns you have over inter-system dependencies (e.g. pass around an IQueryEntitiesAtPosition interface so the input system does not have a strict dependency on the collision system).

I think the most important thing though, is that you have a clear dependency chain in your systems. You seem to imply that you have things running on multiple threads. That might (?) be necessary for time-consuming tasks like physics or AI, but most of your systems should be quick (e.g. the process of mapping raw input to a force on an entity), and the more expensive systems can be gated on those first systems.

The order in which your systems run is generally very important, so you should be deliberate about it.

Yes, that's true. I did have a think about that months ago but kind of forgot about it while writing this post. When I wrote my simple task manager, I added a dependency term, so if a dependency is added to a task (e.g. Command system depends on the input system) then it must wait for that system/task to finish first. This then brings a different way of doing things. So when the input system is done it can transmit all of its events via the messaging system and then the command system would read these immediately. I'll have to modify my messaging system a little but that should be easily doable.

I don't think is necessarily a bad thing. Even with your event system you still have that dependency - it's just a runtime dependency instead of a compile-time one (which could be legitimate if you might have multiple systems responding to the question "What is at this location?" - but do you?).

Yes, if I can't avoid it then I'm okay with doing it (i.e. having direct access to other systems). However, if I can avoid it with the event system (as I think I can as stated above) then I'll go with that solution!

You seem to imply that you have things running on multiple threads. That might (?) be necessary for time-consuming tasks like physics or AI, but most of your systems should be quick (e.g. the process of mapping raw input to a force on an entity), and the more expensive systems can be gated on those first systems.

Actually I don't have multithreading yet but I plan on adding it once I get my head around these other issues. The task-dependency should also work fine in multithreading so in principle it should work. Then the only place I will need any locks/atomic operations is the task manager and the messenger system. But this is for the future! First just want to get it to work on simpler things (in serial) before scaling up the complexity! :-)

Thanks for your help.

I'd definitely opt in for handling events on the fly, delaying it this way definitely isn't going to hold out as you noticed yourself.

For your second solutions, I foresee that 'limited' queries will likely blow up, assuming you really want to keep the event messaging system mediator. 'Oh this system actually needs to be ran before this one, and then before this one'. I doubt you're going to keep the messaging system in tact for long with that, but I could be wrong here. I doubt this will make concurrency easier either compared to the first method either, to be honest. Even if your order of system was perfectly defined, e.g. physicsSystem.simulate(); renderSystem.render(); both depend on an entity's position (/transform), so they can't be ran in parallel either.

(I know that's not much more than what you have said already, so tl;dr for the above: I'd agree with your own findings)

Especially in the scenarios you have mentioned, but with ECS in general, systems will always be able to touch all the data of an entity. If you're not making the data dependencies explicit, it will be quite difficult to have systems potentially run in parallel without possibly touching the same data. I believe this ecs, which does a lot of things on compile time, also bases it's concurrency of systems on compile time specified data dependencies, but I haven't had much of a look at it. I'd say this is decently achievable, since you usually know which systems act on which data at compile time.

I still am a bigger fan of splitting up the tasks within a system, especially since ECS can lend itself well to data parallelism. Systems that just perform the same task for each '...' - component, primarily. Only having a system in its entirety as a task gets rather close to the system on a thread model. The possible speedup is constant at best and would be limited to the system taking the most time.

If haven't applied it to my own engine yet in its entirety, but I think you can already achieve quite some parallelism by bulking data more and trying to see if you can perform the same, uniform, task on each of the collection's element. For example, with your movement system, rather than processing a movement message at a time, have the sender invoke a message with all his requested movements and create a task for processing each movement in the message. On a sidenote, taking this will likely generalize messages more too (not just listen for movement of entity x, but every entity that moves and finding those/that of interest within the message).

I'm not implying that this makes concurrency easy. The optimal cases with the 'loop over components in parallel and update' or alike aren't going to be too common. Same applies to making a generic event system that has no dependencies on other systems. While you can reduce them by splitting up a game loop more, it is quite obvious that it only works for so much.

It's important to understand that your abstractions should justify themselves. Queueing up a message is in effect queueing up a function call, and storing it in memory. This is an abstraction layer, and for what purpose? What is the justification? Delaying a function call over a time delta should be a feature, not a side-effect. In your engine it should be very easy to specify a message that will be processed immediately (like a function call), or to specify a delayed "queued" message that will sit and be processed later.

I'd definitely opt in for handling events on the fly, delaying it this way definitely isn't going to hold out as you noticed yourself. For your second solutions, I foresee that 'limited' queries will likely blow up, assuming you really want to keep the event messaging system mediator. 'Oh this system actually needs to be ran before this one, and then before this one'. I doubt you're going to keep the messaging system in tact for long with that, but I could be wrong here. I doubt this will make concurrency easier either compared to the first method either, to be honest. Even if your order of system was perfectly defined, e.g. physicsSystem.simulate(); renderSystem.render(); both depend on an entity's position (/transform), so they can't be ran in parallel either.

Okay, I think I've got an idea now with how to deal with this. Basically, a lot of systems can be run in parallel with no problem, such as audio, rendering and some other smaller background processes. But some, as I mentioned, perhaps need to run in a defined order, e.g. User Input -> Commands -> Movement/Physics/Scripts. The way I've done this in my framework is to make all systems as tasks in the TaskManager and then to define dependencies. If no dependency exists, the task can run whenever. If one exists, then the task must wait for its dependency to finish first.

When it comes to messaging, what this means is its also dependent on receiving messages from its dependent tasks. So when Task A (e.g. User Input) has finished, it will have a series of messages to be sent. Normally this would go to the Messaging System to be broadcast at the beginning of the next step. But if there is a dependent task B (e.g. Command System) then it can broadcast directly to its Message mailbox to be read and processed immediately. And then the next dependent task (Movement/Physics) will read the message directly from Command. And this can be done quite cleanly and automatically (if I design it correctly; we'll see).

Regarding the concurrency/paralellism, all systems use a single component type and do not access other components. Some data (position being the obvious one) is duplicated and shared via messaging so there is no need for systems to access other system data. This is definitely one feature I would like to keep (hence why I don't really want systems to directly access other systems). I'll implement this if I need to but only if these other ideas don't work.

Let's see anyway. Might take a couple of days to implement but fingers crossed! Thanks again! :-)

Delaying a function call over a time delta should be a feature, not a side-effect. In your engine it should be very easy to specify a message that will be processed immediately (like a function call), or to specify a delayed "queued" message that will sit and be processed later.

Yes, I agree that it should be a 'feature', in the sense it should be planned for and controllable (rather than something out of my control which it feels like with this 4+ frames cascade). Hopefully what I described above will remedy this.

Regarding the immediate message processing, I was reading the event chapter in 'Game Coding Complete' and they discuss this solution using the Observer pattern. I could do this definitely but I'm a bit worried about the concurrency issue. As I mentioned in the original post, double buffering might help, so the system itself will work on the current buffer but any queries (e.g. checking for ray intersection/collision for touch control) would use the 'old' buffer. I think this should solve it anyway!

Frankly trying to make everything in your engine run as an amorphous blob like a "task" is just going to end in tears, as others have stated some systems really do need to run before others. Like all your input should be processed and converted into events the second you go to process events and you should then process all those messages at once. Leaving things to happen frames later seems to me like it will just end up creating buggy, convoluted code and make it easier to make mistakes like queuing things up incorrectly.

Also rendering is not that easy to run in parallel because it depends on rendering a scene based on information about where things are in the scene, if you go and start updating ai and physics and such then things end up completely changing positions or animations. Getting around that would require copying or locking the data each time you want to render, which sounds horribly slow.

Games are not a great candidate for concurrency unfortunately, that's why only certain systems are useful to parallelize, and why the only threading is usually a system that eats up computation tasks.

I'd like to throw my pennies in the pile as well, which might echo Satharis a bit.

Your idea of making the systems run concurrently is, in my opinion, not a good idea. You should have a very clear and strictly defined sequence of operations that transform some data into other data. You input system transforms input messages into commands. You command system transforms commands into general actions. Your physics system transforms these actions into a position delta for your entities. As a purely output based system, the rendering system reads the positions, mesh/sprite data, and outputs them to the screen. None of these things can be run in parallel. Even audio might need information about an entity so it can play sounds at the correct 3d world position. And then you have animation systems that might have to be processed two times, once before physics, once after.

Your assumption in the first post that the systems are independent is thus not correct. All of these systems are dependent on another system, even your input system has a dependency (SDL).

What you can do is parallelize the transformations within each system, so each object is transformed independently of one another.

So to get back to the question you posed: Send out each message immediately as it's ready, unless you want the message to be received at a later time, in which case you should be explicit about it and probably just process the message in the receiving system at that later time.

devstropo.blogspot.com - Random stuff about my gamedev hobby

as others have stated some systems really do need to run before others.

Well, I said this too right with only running tasks when their dependencies have finished? :-) Atm, I'm only running things in serial but enforcing the dependencies (i.e. enforcing some sort of order) should solve this problem right??

Like all your input should be processed and converted into events the second you go to process events and you should then process all those messages at once

Yeah, I'm getting the feeling I need to have some sort of capability to do this. Atm, the solution I proposed a few posts ago should improve my immediate situation but then I'll have to get onto more immediate processing. However, I still can't see how this can be done without some kind of double buffering or locking. Take the collision system for example (again). The collision system's first job should be to update all the collision boxes based on moving/new entities. If another system asks for a collision query while it is doing this, then it obviously cannot do this immediately since the data is being updated. So, I guess the possible solutions are :

- Double buffering! A second data structure with all the 'old' (i.e. previous step) collision boxes is used to queries while the new one is being updated (for use in the next step). This means collision queries are one step behind, but as long as this is for things like triggers and button input, it should be fine (N.B. Physics collisions will be handled separately)

- Locking! Only one data structure for collision boxes but if it's being updated (by the collision system) it's locked so outside systems cannot query it and must wait (probably not good!). Prevents one-frame lag but potentially can have performance problems.

- Queue event but process asap! If the collision system is updating, then query is queued and processed immediately once everything is up to date. Possibly the best solution (unless there's something I've not thought about)

Also rendering is not that easy to run in parallel because it depends on rendering a scene based on information about where things are in the scene, if you go and start updating ai and physics and such then things end up completely changing positions or animations. Getting around that would require copying or locking the data each time you want to render, which sounds horribly slow.

Well, I think I already mentioned this but perhaps not clearly enough. Each system (including the rendering system) only operates on its own component. So there is no need to access any transform component (which would affect performance). Instead the rendering component has its own local copy of its world transform, where changes in positions and rotations of entities is communicated via the event system. Therefore there is no dependency anymore since the rendering system has all the information it needs and can render in peace on its own thread. It does mean of course the positions are one step old but I'm not worried about this (compared to other issues). Actually the bigger problem for me (looking to the future) is with OpenGL you can only render on the master thread which is where other things might happen. Vulkan might help this, but I'm not anywhere near ready to take that leap yet!! :-)

Games are not a great candidate for concurrency unfortunately, that's why only certain systems are useful to parallelize, and why the only threading is usually a system that eats up computation tasks.

Yeah, I'm beginning to see why >.< . There are some types of tasks which might fit into this model better, such as AI for large numbers of enemies (e.g. Total War style battles), on-the-fly procedural generation (can be quite intensive) and others.

Anyway, thanks for your comments. The more I talk about this, the clearer it's getting what I need to do! :-)

This topic is closed to new replies.

Advertisement