At the end of the day, all that any program does is transform some data into some other data, by a sequence of processes.
Well, I stand behind my claim that nowadays performance is gained by relinquishing control. At the end of the day you'll have to trust the framework to do the right thing. We take care of the edge cases for you.
Given enough time, money, talent and patience, it will always be possible to replicate the results of some framework by instead writing everything by hand, even in raw asm etc... The reason this isn't feasible is that the time/money/talent/patience costs are insanely high!
Choosing to use a framework/library/middleware, as always, should be based on a cost-benefit analysis -- e.g. can we get the same performance ourselves, given the same amount of time/money and our current staff?
Often the answer to that is "no", which is why middleware is so damn popular in the games industry (contrary to your assertion). Games can be written from scratch (and sometiemes you might have better performance if they were), but it's usually cheaper/faster to just license some middleware that reduces your production time by an order of magnitude.
That's exactly my point -- writing shared-state multi-threaded code without a framework is difficult and dangerous, and hence discouraged. That's why game developers do use concurrency frameworks, and discourage the "manual authoring" of multi-threaded code outside of one (i.e. we don't like to write multi-threaded code by hand).
Except if you use a transactional framework that takes care of all that for you.
That said, transactional frameworks are largely rubbish for many kinds of games, as they often make the decision to give up determinism, which is essential for some games. Typical shared-state designs revolving around mutexes are often the same, with execution speed of different threads determining the order in which simulation operations occur.
On that note, with your space-ship example, it's not obvious what kind of ordering constraints are/can-be applied to the different processes. Are all beams fired before any damages are calculated, etc?
In the Actor Model implementation that I've used for games, these kinds of ordering issues were explicit to the programmer, allowing them to maintain a deterministic simulation, regardless of the number of cores or the scheduling algorithm used. e.g. An actor could specify that within a particular simulation frame, it would process all "FireBeam" messages before it processed any "TakeDamage" messages.
As above, I don't know of any big games using anything like STM, instead most use something closer to a stream-processing or flow-based programming model, which is easy to reason about (you can write the main flow in a single-threaded style and have it be automatically decomposed), has few performance pitfalls (no messy rewinding, can be well optimized for data access patterns), is 100% deterministic, and allows for huge data-parallelism and task-parallelism.
STM-like techniques are usually only an internal implementation detail, e.g. the algorithm to push an item into a lock-free queue may be transactional, repeatedly trying again and again until it succeeds without conflict. This queue would then be used to implement a more useful concurrency model.
I wouldn't agree with that. Multiple threads sharing the same RAM without restriction gives us the "shared state" model of concurrency. When we build a message passing framework, we build it directly on top of this model.
Every shared-state concurrency/distribution (we can agree it's the same problem) is an abstraction on top of message passing, that translates contention to messaging. In the CPU, that messaging layer is called cache-coherence, and uses variants of the MESI messaging protocol.
Underneath this model, there is the hardware implementation, which in some cases, yes, uses MESI/etc, which gives us the layers:
1) Message passing between caches.
2) Shared RAM.
3) High level Messages.
The high level message passing layer is still built on top of the shared-state layer, regardless of how layer #1 operates. #1 is just a black box in order to make layer #2 function correctly.
The type of message passing going on in layer #3 is semantically different to layer #1.
Also, in the example of the PS3 that I was using, the SPU co-processors don't have a cache at all, so obviously don't implement MESI. They have their own local-RAM, and can initiate transfers to/from the main shared RAM and their local-RAM (and two concurrent but overlapping transfers will have indeterminable results).
You could argue that these transfers are "messages" and therefore it's built upon message passing, but if you go that far, then "shared-state" ceases to exist at all: interacting with the global state is just "sending a message"! ...and that doesn't seem like a useful thought construct.