So, I didn't read all the replies properly because I am a Vasari adept, and time indeed is short. I am mostly sure that someone somewhere said that if you hit a performance bottleneck twice, it's likely that the game logic is the cause of slowness, not iterating over the components. Anyway, here are the conclusions that I've arrived while programming my components and systems, but before...
Over-engineering is hell! If you're programming alone on your free time, over-engineering can take as much time as fixing stuff for jumping on too many template too early. Stay with the simplest approach that solve your problem, refactor when needed. And be careful with premature optimization.
Okay, now to the stuff. My (FAKE) ECS recently went through the third refactor. One of the causes of the last refactoring is that I wanted to use the simplest component storing system possible (a vector of smart pointer to components), while keeping a door open to the lightning fast, vector of classes contiguous on memory, ordered by entity id, thus increasing the chances of cache hit while iterating upwards. To achieve this, I've decided on some compromises:
- Components are stored on special containers, whose single responsibility is to allocate and deallocate components from whichever data structure they use.
- Entities stores a type of unique handle to components, that will bridge the communication with the container. An unsafe, raw pointer can be obtained for performance when needed, but overall, the indirection impact has been immeasurable. If the entity releases this unique handle, the component is destroyed.
So, the entities are the unique owners of the components (even though they're stored on containers), and they can create weak references for other components. The point of the weak references is that I decided that I want my components to have some logic, instead of being just data. All sort of problems came with this decision, but I'm sure all sort of other types of problems would come if I followed a pure ECS approach. Pick your favorite.
Back to the containers, the components can be iterated by a ranged for loop, directly accessing the component through an iterator instead of a handle. It is up to the programmer to guarantee that the container won't be "changed" while anything is iterating over it. The systems will simply use this ranged for feature to iterate over the elements.
Now, about the containers with multiple types, I'm thinking in adding a new type of set container that references entities that have a set of specific components. The entity, when receiving a new component, would fire an event that'd be consumed by the sets containers, which would analyse the entity's components and decide if the entity would be referenced or not. When removing a component, the entity would fire an event as well, so that the set containers could unregister it. But I didn't need this yet, since even my rendering system is way too simple for now (just the bare minimum for a deferred rendering).
I don't have any statistical data or citations to back up my belief, but I really do believe that even the most optimized engines out there will hit a bottleneck with the sheer number of entities and components. Usually, they'd go for alternatives such as batching stuff, building some tree structure to organize data and reduce the amount of objects instances, do some algorithmic changes to the way things behave, or even making an entire group of special cases - such as rendering, which eventually you'll worry about overdraw, draw calls, order of rendering, order of applying effects and whole new worlds of challenges and tears. I'd advise to double check if your bottleneck can't be solved by means of architecture, before dealing to heavily optimizing the components system.
When the optimization is no longer algorithmic or structural, code tends to get really, really really harder to maintain, a a lot more bug-prone over time.
Edit: changed a few things for easier reading, and added the last paragraphs.