The goal is the following:
I need to shoot rays from my camera into the scene every frame. At the intersection points I am creating virtual cameras. Which are then used to render the scene again (or only a bunch of objects). Creating virtual cameras is analogous to creating viewProjection matrices.
The current approach is:
I render the scene from my camera and output the world positions and the normals with multi render targeting to two different textures. So I have two textures on my GPU which contain the world positions and normals of the scene seen from the camera. To create my viewProjections matrices I only need to get the position and the normal from the textures ( I always assume the up-vector to be vec3(0, 1, 0) ). At the moment I am downloading these two textures and create the matrices on the client side, which are then send to the server side again for further rendering.
The models with 200K vertices are by far not absurd. My GPU runs fine at 40-60fps in a scene with about 1 million vertices and goes even higher when not all vertices are in the camera frustum.