Depends on many things. What's the interaction between the server and client, for example.
From what I understand, the client sends inputs to the server, the server applies the inputs, and send the positions back to the client. That means on the client side, the input lag would be a round trip time (which could be huge), plus around a network frame delta time (which at 6 fps, would be quite significant, between 300-150ms).
OnLive kinda works like that, and relies on reducing the latency between user inputs and video streams of the game render frames as much as possible.
For a networked game context, look up client-side prediction, which allows the client to compute the movement in real time locally, and the server to correct the client if his prediction drifts too far away.
Here is an example of such a system.
http://gafferongames.com/game-physics/networked-physics/
Many games use that sort of client side prediction and correction mechanism.
Now, for a remote client (imagine another client spectating the game), it is slightly different. The host just sends position updates, and the client interpolates positions, as you seem to be doing at the moment. The client then has to run his clock behind the host. This is so the client always interpolates between known positions, instead of extrapolating and 'guessing' where the next position will be. Extrapolation can introduce more jitter is the position updates are not predictable and more chaotic (for example, a player bullet dodging left and right).
'Entity interpolation' :
https://developer.va...ayer_Networking
the
'Input prediction' part is a summary fo what is presented in Glen Fiddler's article.
Thirdly, for the host to provide accurate hit detection, he has to 'simulate' what each client see, how much they are compensating for lag in their interpolation, and wind back some physics and animations at the time of the shot. That's the
'Lag compensation' part of that article above.
Note that 6 fps is kinda low for network updates. Right there, you already introducing around 150ms latency just because of your network tick rate. Input network packets need to be a lot higher, say 20 fps.
The advantage of that method is that the client -> server bandwidth usage will be very low, as you only need to send inputs, and predicted physical states of your local player. So you can send at a higher frame rate. Secondly, the correction packets send by the server do not need to be of high frequency. The higher the frequency, the less 'rubber banding' the user will experience when their prediction goes wrong (for example near another player or some server entity they collide with).
I know it sounds complicated, maybe too much for your purpose. But since I can't see the video or have to guess what you want to do precisely...