# Multithreaded renderer and input latency

This topic is 3275 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I've implemented a multithreaded renderer and now I ran into input latency problems. Basically I have two threads: game logic thread (running at fixed 60 fps) and renderer thread (running as fast as possible). The game logic thread updates the logic (handling also input), creates a render buffer and sends it to the renderer. The renderer then interpolates between the newest two render buffers. This setup works perfectly when the renderer's framerate is high, but when the renderer is running at 60-90 fps, the mouse response seems too slow. This is due to the fact that the renderer has time to start a new interpolation frame even though it doesn't have enough time for a full frame render. Let me demonstrate what I mean with an example:
t = interpolation parameter

Logic (50 fps)                         Renderer (75 fps)
-------------------------------------  ----------------------------------------
20 ms: Frame0 ready, sent to renderer
40 ms: Frame1 ready, sent to renderer  Begins to render Frame0-Frame1 (t = 0)
53 ms:                                 Begins to render Frame0-Frame1 (t = 0.67)
60 ms: Frame2 ready, sent to renderer
66 ms:                                 Begins to render Frame1-Frame2 (t = 0)

There's an extra 6 ms latency between the Frame2 update and the moment when the renderer begins to render Frame1-Frame2 interpolation. As a test, I disabled the interpolation and rendered the newest frame right after the frame update, and the lag was gone. But that limits the renderer's framerate to logic framerate, which was one of the reasons why I implemented the interpolation system in the first place. I have thought of two possible solutions: 1) Store the last frame time after each rendered frame, and check if there's enough time to render another frame before a new logic update would arrive. 2) Measure the time between the frame update arrival and the moment when the renderer begins using it. Instead of setting t = 0, set t = arrival-to-begin-rendering latency / logic update rate. What do you think of these? Are there any better solutions to this latency problem? [Edited by - Joni-Matti on June 5, 2009 4:47:13 AM]

##### Share on other sites
for an ideal multithreaded engine system, the render thread should not perform anything but sorting render graph and sending commands down to the card. the interpolation is likely to be bottle neck.

also if ur goal is to enhance performance (frame rate) by off loading instructions onto a different thread, then ur current model cannot achieve that. ur render thread is still waiting for ur update thread to process the buffer. u need to get rid of this wait time.

also another thought, there really isnt any reason y u need any buffers for ur render thread, u already have 2 buffers, one in ur card, one in ur update thread.

##### Share on other sites
I think I have to clarify how the renderer works. The renderer thread performs the interpolation between the two newest buffers, culls the objects based on their interpolated positions, sorts the render graph for the proper rendering order and sends the commands down to the graphics card.

The renderer does not wait for the logic thread at all. It selects the two newest buffers for interpolation only when it has got a new buffer update. If there isn't a new buffer update received, it updates the interpolation parameter in respect with the time that elapsed while rendering the last frame. So even though logic updates would arrive at 10 fps, the renderer can draw 700 fps and it smooths out the movement. In my opinion that is an enhancement in frame rate. What part of my system made you think that my current model cannot achieve that?

How come two buffers would be enough? The graphics card itself doesn't contain any (proper) buffer at all. We're talking about CPU-side object state buffers, and all of this data isn't fetched to the graphics card. In the interpolation case, if logic thread is updating one buffer, the renderer still needs two buffers so that it can interpolate between them before sending the commands to the card. In the 700 fps example, the renderer would use those two buffers for 700 interpolation cycles.

If I don't use interpolation, then two buffers would be enough: one in the logic thread and one in the renderer thread. But in that case, the renderer must wait for the buffer update, which isn't desirable.

I know the interpolation is causing a part of the lag, but in my tests I have seen that the lag would be acceptable if the renderer wouldn't be late when the next logic update arrives. Let's take another example which shows the extra lag that causes the worse input latency. The logic is updating at 60 fps, and renderer is rendering at 62 fps. When the renderer has received 2 buffers (frames 0 and 1), it begins to render the frame 0. It manages to finish rendering 1000/60 - 1000/62 = 0.54 ms before the logic update for frame 2, so it begins to render another interpolated frame between frames 0 and 1 (interpolation parameter would be roughly 0.97). Not until this interpolated frame has been rendered, the renderer takes the new buffer update into account, beginning to render between frames 1 and 2. But it is already late for about 15.59 ms! This is the latency I'm talking about. Any other thoughts how to get rid of this latency?

The performance enhancement actually isn't my main goal. My main goal is to separate input handling and simulation from rendering, so that even though rendering would run at 10 fps, the simulation itself would be run at steady 60 fps. In that case the renderer is just skipping frames (no interpolation used). The input, however, would be as responsive as in the case where the renderer is running at 100 fps.

##### Share on other sites
There are two issues causing separate problems.

1. You set the interpolation factor to zero for the first render after you get a new frame. This is not correct. In the first post example, the 66ms render should be t=0.33 between frames 1-2. The frame fraction should always increase by an amount that matches the time between rendered frames. Your second example of going from (0).97 to (1).00 makes this obvious, the correct interpolation value would be (1).94.

2. You may be interpolating something you shouldn't if you have perceptible lag in mouse movement. Is there a reason why you specifically need to interpolate the mouse? If you're just moving a cursor around, you don't need to interpolate it. Even an FPS style camera should probably be done without interpolating the mouse (e.g. get mouse position at render time and feed it to the next logic frame for aiming weapon fire etc). Consider the extreme situation where your logic runs at 1fps while the renderer runs at 100fps; you'll want the mouse to be polled every frame rather than once per second.

##### Share on other sites
Even then, it's worth noting that in most games a lag of a couple of frames is not usually noticible in actual gameplay. I've found that the only place where it becomes really noticibe is if you're using a software cursor. Usually the cursor in Windows is accelerated by hardware and updated at an obscenely high rate (like, hundreds of hz) which gives us the responsiveness required for a cursor.

If you're software rendering your cursor (ie. hiding the hardware cursor and drawing your own using a sprite) then your cursor is going to have a latency of at minimum 1 frame. This is usually far too much for comfort - the solution is not to use a software cursor and use the hardware Windows one.

##### Share on other sites
Quote:
 Original post by Fingers_The frame fraction should always increase by an amount that matches the time between rendered frames.

Of course! I totally missed that it is as simple as that. The solution 2 I mentioned would have done exactly the same, but it would've been overly complicated. Now the lag is smaller, but somehow it is still bigger than if I test by waiting for the frame update and drawing the first of the two frames immediately (being one frame late in any case). I need to dig more to find out what is causing this.

Quote:
 Original post by Fingers_Is there a reason why you specifically need to interpolate the mouse?

Well, the logic thread is responsible of creating the window, so mouse movement is polled in that thread (this is how I was able to separate rendering framerate from input framerate). The mouse cursor goes the same rendering route as any other object (it is added to the frame update's render buffer), so it is automatically interpolated. I also wanted to have the logic-render data flow only to one direction (from logic thread to render thread). So the renderer would not send any information to the logic thread, simplifying things.

Quote:
 Original post by Fingers_Consider the extreme situation where your logic runs at 1fps while the renderer runs at 100fps; you'll want the mouse to be polled every frame rather than once per second.

Yes, but how about a situation where rendering runs at 10 fps and simulation at 60 fps? The input would be handled 10 fps in that case. I thought this situation (render framerate lower than logic framerate) would be more common, because nowadays games are usually GPU-bound. One design aspect I kept in my mind was that even on a machine with a low-spec graphics card, the input would be handled at the same speed as on a machine with a high spec-card.

Quote:
 Original post by Sc4FreakIf you're software rendering your cursor (ie. hiding the hardware cursor and drawing your own using a sprite) --

That is exactly what I'm doing. And I measured the (noticible) input lag by comparing hardware cursor's position to the software cursor's position. Even at fairly low movement speed it is about 1 cm. I think I switch to the hardware cursor. But what if I need drag-and-drop? The dragged object would still go through the normal rendering pipeline and it would be late compared to the hardware cursor.

##### Share on other sites
I recommend not interpolating the mouse. It's not an entity driven by the game logic, it's an extension of the player's hand and should be made as responsive as possible.

In the scenario of 10fps rendering / 60fps logic, the non-interpolated mouse will still work better. When you get the mouse position from the OS at the render time (zero delay), it's more responsive than if there's a (up to) 16ms delay between logic and render because logic is running at 60fps.

##### Share on other sites
You are right. It is best not to interpolate the mouse. It is easy to get the current cursor position in the renderer thread using GetCursorPos(), but how about high-definition mouse movement which is obtained by reading WM_INPUT messages in the window message pump? In my system, the window message pump is in the logic thread, not in the renderer thread (in the 10 fps rendering / 60 fps logic case, this decreases the keypress input latency). I would be using the high-definition mouse movement for a third-person style camera. Maybe interpolating would be okay in this case?

Btw, I had a small problem with the interpolation after correcting the timer update as Fingers_ mentioned. The new update routine worked as expected, but somehow the interpolation got sometimes 5-6 ms ahead of time, causing stuttering on the screen because of over-extrapolation. This didn't happen every time I ran the program. If the timing got off in the program start, it continued that way for the whole life of the program. If it didn't got off, it continued to work normally. This off-timing happened in the first few frames, and I didn't find out why. I'm using QPC for the timing. I fixed this problem by measuring the lag between frame update arrival and the moment when the renderer begins to use the new frame update with timeGetTime(), and if the timing had got off too much, I corrected the interpolation time to match the update lag.

• 10
• 17
• 9
• 14
• 41