I've been developing a small networking demo for quite a while now and I'm at the point where searching the net doesn't seem to solve my problem. I understand the techniques described by the Valve papers and the various articles by the likes of Glenn Fiedler and Gabriel Gambetta, however I believe I'm running into a problem that they either are not covering or I just don't understand. Hopefully the wizards of gamedev.net can help me out.
My problem lies with how input is handled in a client / server, fast paced, 2D networked game. I send an input packet to the server every frame as well as running both the client and server at 60 fps. The server sends state updates every 50ms and the clients keep a 100ms buffer which they smoothly interpolate between.
On the client the player can press the 'left' key and the square that represents them locally moves left 16 pixels per frame. This means that if they held it down for say 50 frames, they would expect the box to now be 50 x 16 pixels to the left of the original position. My problem is that the server may simulate more than 50 frames due to the latency induced by the internet (and on my LAN it seems). If the server simulates an extra frame the server-player is now out of sync by 16 pixels with the correctly placed client-player. Sure I could do some reconciliation on the client-side, but not only would it be incorrect, the client would noticeably 'keep going'. I'm wondering to if my environment is too demanding. A 1280 x 720 window means that a frame of 16 pixels is very noticeable, let alone a couple of frames.
I could simply attach a timestamp (or frame count) to the input packets and at the server end, check if it has simulated more frames with a particular input packet than it should have and backup if needed. If world-state updates to clients are around 20 times per second, there is a chance that their use of a 100ms buffer would hide any artifacts, but again I'm not 100% on that. This seems like this might be adding unnecessary complexity. Others seem to hint that they timestamp their input packets but none seem to talk in-depth about how they utilise it.
I'm wondering if I'm simply asking too much. Quite a few games seem to hide these kind of problems with either animations and/or interpolating to the position reported by the server. I get the impression that the biggest movement an entity can make in one frame inside these other games is very small compared to my 16px out of a total of 1280px when moving horizontally. From what I played of Terraria, players tend to walk quite slowly, so an extra frame isn't very noticeable.
I appreciate any feedback you guys can give, hopefully I've explained the problem clearly enough!