Extreme Framerates

Started by
15 comments, last by tonemgub 10 years, 6 months ago
Am I correct in thinking that a framerate of higher than the eye can literally manage to discern (even if it's 1000's) can still be smoother - because the frame you actually register in your sight is actually closer in time to the moment you register it than it would be if the framerate were at or below your eye's registration capability framerate?
ok - let's "assume" even if it's not realistic that "the eye has a fixed framerate"
because this can be applied a different way
Doing rendering in a separate thread than events will mean the same thing - that since the time interval between the both of them may be smaller than either individual time interval - the latest registered event will be closer in time to the rendering?
Advertisement

1) Yes, rendering at extreme framerates can make an animate appear smoother (up to some really high limit, about ~200fps, where you get severely diminishing returns).

You have to keep in mind that your monitor has a limited refresh rate though. Most monitors are only capable of displaying 60 frames per second anyway!

If vsync is enabled, then it's not possible to display more than 60 frames per second.

If vsync is disabled, then at higher frame-rates, one frame will be cut off at some vertical height, and the next frame spliced in. This is called "tearing", and it can be very jarring on the viewer, and actually make things appear to be less smooth!

2) No, having a dedicated event processing thread is a complete waste of time.

In your diagram, you show rendering as a line. You need to show it as a box, as it takes time. At the start of the box, a certain set of events are consumed to determine what will be rendered. At the end of the box, this rendering is sent to the screen.

If any events arrive in the duration of that box, then they have to be queued up and consumed by the next render...

The OS already does this internally. As soon as an event arrives, the OS puts it into a queue. Right before rendering, you typically poll the OS for any events that have arrived since last time you asked, and you use these to influence what to draw next. Adding your own extra thread to reinvent this same functionality will only add overhead.

unCwhRN.png

The red lines show the time elapsed from when an event is generated to when it is sent to the screen -- i.e. the latency. In both systems (single threaded and event/render threads), the latency will be the same.

Reducing the size of the green boxes == increasing the frame-rate == reducing latency.

So if you want it to be more responsive and smoother, then yes, you need to increase the frame-rate. However, current displays put a hard limit on how much you can do this.

Increasing bandwidth can lead to a decrease in latency due to the exact effect you're talking about. Decreasing latency will almost always make your game feel better and more responsive no matter how it's done. Another way you can go about is, for example, sampling the player input as closest to the end of the frame as possible, or just making sure that nothing in your game is causing input processing to get delayed.

Using a separate rendering thread doesn't usually decrease latency because the rendering thread usually has to run a frame behind. You can't render and update the same frame at the same time, otherwise you end up with temporal inconsistencies between frames (i.e. jitter). Rendering on a separate thread is mainly a performance benefit to take advantage of multiple cores.

Why doesn't anyone mention the GetMessageTime function in this kind of discussions? I think the events-handling routine should always use the GetMessageTime as the time when an event happened, instead of a specific frame-relative time. It takes a bit more work ordering the events (only if they arrive out-of order from the message queue - I don't know) and processing them in that order, but the end result should be no latency at any framerate (at least not input-related latency).

I'm also sure DirectInput has something similar, or if you're coding for another platform than Windows, there's bound to be something similar... It's the first thing I would look for.

EDIT: Searched the forums for "GetMessageTime" - this is the first post mentioning it. :)

Why doesn't anyone mention the GetMessageTime function in this kind of discussions? I think the events-handling routine should always use the GetMessageTime as the time when an event happened, instead of a specific frame-relative time. It takes a bit more work ordering the events (only if they arrive out-of order from the message queue - I don't know) and processing them in that order, but the end result should be no latency at any framerate (at least not input-related latency).

I'm also sure DirectInput has something similar, or if you're coding for another platform than Windows, there's bound to be something similar... It's the first thing I would look for.

I know L.Spiro has mentioned this, having gone through the effort to implement it himself wink.png

[edit] I was mistaken, he's implemented an alternate way of time-stamping inputs [/edit]

It is tricky to implement though, as the message timer wraps around and probably isn't consistent with your game's actual timer.

However, this doesn't result in having no input latency. There's a lot of unavoidable latency:

  1. You press a key, the driver takes the input and passes it on to the OS.
  2. The game requests inputs from the OS, and uses them to update the game-state.
  3. The game renders a new frame based on the new game-state. This produces a stream of commands to the GPU.
  4. The GPU eventually starts executing these commands after having buffered them to maximize it's internal bandwidth.
  5. The GPU finishes executing these commands, and queues up the final image to be displayed.
  6. The monitor decodes the signal and displays the image.

Assuming a 60Hz game and display:

Event #1 is almost instant.

Event #2 (updating a game frame) and #3 (rendering a frame) depends on the game. Let's say it's right on the 60Hz limit of 16.6ms, in total between the Update and Draw functions.

Event #4 depends on the game and the graphics driver. Typically a driver will buffer at least one whole frame's worth of GPU commands, but may be more -- say 16.6 - 50ms here.

Event #5 depends on the game, but let's say it's right on the 60Hz limit of 16.6ms again, spending this time performing all the GPU-side commands.

Event #6 depends on the monitor. On a CRT this would be 0ms, but on many modern monitors it's 1-2 frames -- another 16.6 - 33.3ms

That's a total of between 50 - 116.6ms (typically it's around 80ms) between pressing a key and seeing any changes on the screen, for a game that's running perfectly at 60Hz with "zero" internal input latency.

i.e. even in a perfect situation where you capture an event as soon as it occurs, and you immediately process a new frame using that data, you've still got a lot of unavoidable latency. They key objective is to not add any more than is necessary! Having "three frames" of input latency is a best-case that many console games strive for tongue.png

Back to the message timer though:

Typically, at the start of a frame-update, you fetch all the messages from the OS and process them as if they arrived just now.

With the use of this message timer, you fetch them at the start of a frame, but process them as if they arrived a bit earlier than that.

In a 60Hz game, for example, this might mean that an event arrived at the start of this frame, but actually occurred n% * 16.6ms ago (where n > 0% and n < 100%). If it's a movement command, etc, then you can move the player 100%+n% (e.g. 101%-199%) of the normal per-frame distance to compensate. This would have the benefit of slightly reducing the perceived latency. e.g. if your objectively measured latency is 80ms, your perceptual latency might be around 63 - 80ms wink.png

Personally, I'd only bother with this technique if you'd already done everything else possible to reduce latency first, but YMMV cool.png

EDIT: Searched the forums for "GetMessageTime" - this is the first post mentioning it. smile.png

http://www.gamedev.net/topic/630735-multithreading-in-games/#entry4977300

The main problem being that it is not synchronized with your in-game timer.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


The OS already does this internally. As soon as an event arrives, the OS puts it into a queue. Right before rendering, you typically poll the OS for any events that have arrived since last time you asked, and you use these to influence what to draw next. Adding your own extra thread to reinvent this same functionality will only add overhead.

Does what NickW said make sense?


Another way you can go about is, for example, sampling the player input as closest to the end of the frame as possible

Does it mean that it's possible to sample the input closer to the start of the render function with separate threads - or is this pointless since, like you mentioned, the OS queues the stuff up so sampling directly before rendering all in a single thread is just as good as it gets? Is this still valid if both these steps (processing input & rendering) are done in a framerate-restrictive loop? Like say you setFPS( 120 ) - then, should *only* the rendering be restricted to 120, with events running in a faster unrestricted parent loop - or should they both be inside the time restrictor?

  • The GPU eventually starts executing these commands after having buffered them to maximize it's internal bandwidth.
  • The GPU finishes executing these commands, and queues up the final image to be displayed.
  • The monitor decodes the signal and displays the image.
AFAIK, all of these are done in a separate driver-thread, and the driver always does them as fast as possible. They don't influence the game's internal latency - which is what I was talking about. This is what I meant by "input-related latency", i.e., latency caused by delayed processing of the mouse/keyboard events. The latency introduced by GPU-processing is unavoidable and I don't think it's worth the effort going that far as to account for it in a video game? However, if you do code your event handler to use frame-relative times instead of the actual event times, then yes - your input latency will also be affected by your framerate, which depends on how fast the GPU renders your frame, and also by your game-state updating code. IMHO, when it comes to thinking about the GPU's role in all of this, it's not that important that it renders your frame immediately or after some milliseconds, but it is important that the frame represents the exact game state that your game is in at the time you asked it to present your frame. Anyway, the numbers you mentioned seem a bit exaggerated to me. :)

Best example would be a game that renders everything at only 1 FPS - for whatever reason (who knows, this might be a nice-looking effect :) ). If you only compute the game state every second and you don't use the event's real time, you'll notice that your objects on the screen are affected too much by a keypress that only took 1-5 milliseconds at best, because you will be computing the object's velocities/positions etc. at one-second intervals. So instead of moving for only that 5 milliseconds' worth of time, your objects will move for 1 whole second. It would be more accurate (and frame-rate independent) to use the event's real time.

And no matter how much you time things trying to account for different hardware latencies, you will not be able to always get the constant frame rate that you plan for, so basing your event handler on frame-times is always a bad idea IMO.

AFAIK, all of these are done in a separate driver-thread, and the driver always does them as fast as possible. They don't influence the game's internal latency - which is what I was talking about.

Does what NickW said make sense?

Yes.

Does it mean that it's possible to sample the input closer to the start of the render function with separate threads

If the user-input is required before rendering can begin, it makes no difference which thread grabs it from the OS. No matter how many threads there are, the rendering can't commence until you've decided on the user-input to process for that frame.

No matter how many threads you're using, you should make sure to get the user input as close as possible to rendering as you can. E.g. with one thread, you would want to change the left-hand order of processes to the right-hand one (assuming that the AI doesn't depend on the user input).


GetUserInput    ProcessAI
ProcessAI    -> GetUserInput
Draw            Draw

AFAIK, all of these are done in a separate driver-thread, and the driver always does them as fast as possible. They don't influence the game's internal latency - which is what I was talking about.

The driver doesn't necessarily do them as fast (as in "soon") as possible (to optimize for latency). It may (and often will) purposely delay GPU commands in order to optimize for overall bandwidth / frame-rate. By sacrificing latency, it opens up more avenues for parallelism.

N.B. I didn't contradict you, or say that measuring input timestamps was useless. It's just a small figure compared to the overall input latency -- the time from the user pressing a key to the user seeing a result.

Anyway, the numbers you mentioned seem a bit exaggerated to me

You can test them yourself. Get a high-speed video camera, ideally twice as fast as your monitor (for a 60Hz monitor you'd want a 120Hz camera), but equal speed to the monitor will do if you don't have a high-speed one (e.g. just 60Hz), though that will be less accurate. Then film yourself pressing an input in front of the screen, and count the recorded frames from when it was pressed to when the screen changes. Ideally you'd want to use a kind of input device that has an LED on it that lights up when pressed, so it's obvious on the recording which frame to start counting from.
If you get 3 frames or less delay, you're on par wink.png Try it on a CRT, a cheap HDTV and an expensive LCD monitor, and you'll get different results too.

The latency introduced by GPU-processing is unavoidable and I don't think it's worth the effort going that far as to account for it in a video game?

It's unavoidable that there will be some inherent latency, but you do have an influence over how much there is. They way you submit your commands, the amount of commands, the dependencies that you create from GPU->CPU, the way you lock/map GPU resources, whether/when you flush the command stream, how you 'present'/'swap' the back-buffer, etc, all have an impact on the GPU latency.
By optimizing there, you could shave a whole frame or two off your overall input latency, which is why I'd personally make those optimizations first, before worrying about message time-stamps, which can only shave less than 1 frame off your input latency.

If it's worth measuring input timestamps to reduce latency for a video game, why wouldn't other methods also be worth the effort?

And no matter how much you time things trying to account for different hardware latencies, you will not be able to always get the constant frame rate that you plan for, so basing your event handler on frame-times is always a bad idea IMO.

I'm not suggesting that you take GPU latency into account in your input handler.
You shouldn't code your input handler to account for this latency (unless you're making a game like Guitar Hero, where you absolutely require perfect input/display synchronisation), but if input-latency is important to you, then you should code your GPU components in such a way to reduce GPU latency as a high priority task (n.b. guitar hero is also designed to run at 60fps with minimal GPU latency).

Best example would be a game that renders everything at only 1 FPS

Yeah, at that time-scale, then message time-stamps have a much greater effect-- especially on the case where a key is pressed for less than one frame.
Most games are designed to run at either 30fps or 60fps, not 1fps though tongue.png

At 1fps, if someone taps a key for 1ms, then the error in not using timestamps will be huge -- you'll either assume they tapped it for 0ms, or 1000ms!
At 1fps If someone pressed a key and holds it for 30 seconds, then using timestamps or not has a much smaller difference (the error of ignoring timestamps is lower) -- you'll either assume they held it for 29 or 30 seconds.
However, at 30fps, the difference is less extreme.
If someone taps a key for 1ms, and you assume they tapped it for 33ms, that's still a 33x difference, but it's imperceptible in most games.
And if they hold a key for 30 seconds, but you assume they held it for 29.967 seconds, that's almost certainly imperceptible.

Even pro-gamers struggle to provide more than 10 inputs per second (100ms per input), so 33ms error isn't the top of most people's priority lists. Not that that it shouldn't be dealt with though!


The main problem being that it is not synchronized with your in-game timer.
Out of interest (I've never done this) how do you deal with this issue tonemgub?

Is it possible to measure how far in the past an event occurred? From what I can tell, you can only measure the elapsed time between two different inputs.

That's great for cases where a key is pressed and released on the same frame (which happens and would be very important at 1fps, but is rare at 60fps), but when rendering, there doesn't seem to be a way to determine that, e.g. "this key was pressed 10ms before the render function"...

AFAIK, L.Spiro has dealt with this by deciding to make use of a dedicated input processing thread, which attaches it's own timestamps?


Another way you can go about is, for example, sampling the player input as closest to the end of the frame as possible

Does it mean that it's possible to sample the input closer to the start of the render function with separate threads - or is this pointless since, like you mentioned, the OS queues the stuff up so sampling directly before rendering all in a single thread is just as good as it gets? Is this still valid if both these steps (processing input & rendering) are done in a framerate-restrictive loop? Like say you setFPS( 120 ) - then, should *only* the rendering be restricted to 120, with events running in a faster unrestricted parent loop - or should they both be inside the time restrictor?

it doesn't make sense to render with a higher frequency than your monitor shows, so that's the practical limit. NickW is right, you should focus on reducing the latency. e.g. if you could render with 1000fps, but you show just 60Hz, estimate when the next 'flip' will happen, estimate how long you gonna need to render the frame, start processing the input+rendering so you'll be done with it right before your HW allows you the next flip.

(the common way is to process everything and issue the flip that now stalls for 15ms)

also, a lot of people make the mistake to see the frame processing as one big black box where you set the input at the beginning and get the output on the other side after a lot of magic happens inbetween. fact is, that most information processed in that time is not relevant for the subjective recognition of "lagginess", simple things you can do to shorten the subjective latency

1. trigger effects based on input after rendering the frame, before post processing. e.g. a muzzle-flash could be triggered with minor impact on the processing of the whole frame, yet the player would notice it. (btw. it's not just about visuals, hearing immediately a sound is just as important).

2. trigger the right animation right before you render an object. from an software architecture point of view it looks like a hack, to bypass all pipelines and read from some place in memory if a button is pressed to decide on the animation, but it gives the player a good feedback e.g. recoil of your pistol

3. decide on post effects right before you process those, if the player moved the stick to rotate the camera, involve that information in your post-motion-blur. (afaik OnLive is doing those kind of tricks in their streaming boxes).

one of the best known tricks of that kind was back then when we had to choose whether we use a nice, animated, colorful cursor rendered in software or the ugly black-white hardware cursor that was refreshed directly by the mouse driver that was triggered by an interrupt every time you moved the mouse. video-hardware overlayed the cursor on every refresh of the screen. playing an rts with 15fps (which was not uncommon back then) and a software cursor was often very annoying.

This topic is closed to new replies.

Advertisement