Firstly with DoD, keep in mind that it's not a formal term (yet). There's no wikipedia page for it, and there's no consensus for what it means. Basically, a bunch of different dev's started preaching something like "stop thinking about your current design methodologies for a moment, and just have a raw look of what you're actually doing with your data" or "at the base level, all programs are just input->process->output -- forget about your abstractions for a moment and think about what you're asking the computer to do please!".
Because it's not formally defined, trying to practice DoD is necessarily a vague art
However, what about other systems in games? How can data oriented design be applied to them?
You break down the operation into a flow of input->process->output nodes, and try to maximize the size and contiguity of those input/output blobs.
One of the ones I can't figure out though is the event system/message pump. With message pumps you have a bunch of objects (messages) being sent around to a bunch of other polymorphic types (listeners) which seems like it's against the whole idea of data oriented design.
Sometimes the simple answer might just be: you're doing it wrong, throw it out and start again because this whole design stinks.
Is a generic/polymorphic event pump really required and/or the best solution for your problem?
And what about parallelism? I've heard it's a major benefit of data-oriented design, but with objects having dependencies on each other and systems being able to operate on multiple components (the rendering system would need to access the transform component) don't you just end up with a synchronization nightmare like with regular object oriented design?
The issue is that you're thinking about mutable objects. This is typical for OOP code: you make an object, then you call lots of functions on it, and it's state changes over time. If that object is shared between many processes, then ordering and synchronization becomes extremely important.
If you'd come from a functional programming background, then this would seem very strange and alien, and you'd prefer to write systems that rely on immutable objects. When you don't have mutable objects, then this problem can't even arise! You don't have to make sure that the physics-system has written to the transform array before the renderer reads from the transform array, because you don't even have a mutable transform array -- instead, the physics system returns a brand new (now immutable) array, which is consumed by the renderer (and no further sync'ing is requried because the array is read-only). You can't get the ordering wrong either, because the renderer can't possibly begin until after the physics has actually created it's input data!
If you can arrange your program as a DAG of processes (like in the dataflow paradigm) then a lot of these problems go away. You don't have to strictly use immutable state, but the flow-graph makes the synchronisation obvious. One of the main pillars of DoD is to do bulk-processing on large amounts of data, rather than the OOP approach of working on only one object at a time.
e.g. if you write code that calls two systems like this, then you can see the bulk data coming out of one system and being sent to the other:
DoPhysics( &transforms );//write new transforms
DoRendering( &transforms );//draw stuff using new transforms
If you want to use many threads, then it just looks like this instead:
DoPhysics( &transforms );//write new transforms, using many threads
DoSomethingElse();
WaitForPhysics();//ensure all threads have finished with the physics system
DoRendering( &transforms );//draw stuff using new transforms
What about the cache performance of more complex game systems? For example: an object with a Health component is hit by one with a Damage component. If the components are updated by type, then when we check collisions for all the Health components, we also have to check how much damage to apply (from the Damage component attached to the game object that hit the Health component's object). Is this still a predictable enough access pattern for the CPU cache? Or would jumping back and forth between accessing the Health and the Damage components cause lots of cache misses
You can have the collision system, the spell system, the bullet system, etc, all write into an array of damage events (with a target specifying which health component will be affected, a damage type and an ammount).
After all those systems have run, you've then got an array of damage events. You can then pass this to the Damage system, along with the array of Health components, the array of DamageType components, the array of Armour components, etc...
As for performance, it depends If all your groups of components involved in the current process are smaller than the L1 cache then you're pretty good! If they're smaller than the L2 cache, you're still pretty ok. If they're larger than that, then you'll have to actually pay attention and try and be sensible in your access patterns.
When you're working with more than one array of components, often you can iterate one in a linear order, but you have to use random-access indexing of the other array(s). Often, if the process is complex enough, you can unroll the loop slightly and use prefetching to make these random-accesses perform better, but all of that is micro-optimizing that can be done later.
IMHO the most important thing is just that you don't simply end up updating each group of components in some arbitrary order, allowing them to magically communicate with each other -- instead there should be some deliberate ordering to the update functions, that's logically determined by the data-flow, as in the above code snippet.