I thought the image in the link I posted demonstrated how physical avatars (or puppets) are magnified to life size or giant size. But doesn't explain the unaided synchronized animated movements, overlayed into the real world
They could advance the background projector to by one frame (if a background is required), animate all the figurines by one frame, update/move the matte painting, advance the filming camera by one frame, and then operate the shutter like a regular camera to expose a single frame. That would capture the figurines, masked behind the matte painting, and the background projection behind them.
You could project the already-filmed real-life footage onto the background, which would be equivalent to compositing the matte painting and the stop-motion figurines on top of it.
If you use a black background (and possibly a black matte painting as a mask), and put the already-filmed real-life footage into the camera instead of the projector, then you can also superimpose the stop-motion figurines onto the existing footage via a second exposure over the original frames. This is an additive effect though - and you can see it in that russian video I linked, where the tiny dancing man sometimes appears translucent.
It's worth noting that even though that particular video uses double exposure in the simplest way possible, it was also entirely possible to do proper chroma keying in 1940 as well.