Distributing CPU load onto the network?

Started by
12 comments, last by slayemin 12 years, 1 month ago
For a sufficiently fast network it might work out better by doing frame rendering on multiple machines. It adds to latency, but various practical experiments show that users tolerate up to ~7 frames of delay with no notice.

Regardless of partition, at a certain scale GPU will become a bottleneck. It's pretty easy to saturate them today even with a single core. Only advantage such cluster would have is ability to spend equivalent of multiple frames producing a single one, effectively giving you 10/15FPS budget at 60FPS.

All of this assumes completely equivalent hardware, scaling across heterogenous hardware while ensuring latency in this range is not viable.
Advertisement
I'm still struggling with the original question.



Has anybody tried offloading CPU load to a networked computer within a game? Obviously the first step would be to avoid it if possible by writing efficient code or using optimal algorithms. But, assuming everything has been optimized and you've used multithreading to take advantage of all possible CPU cores, would there be an advantage in distributing the load onto a network?


[/quote]

Yes, it has been done many times. More on that below.


A game needs a minimum spec. If that minimum spec is a single machine, then fully functional and optimized means that it only needs a single machine to be fully functional. That is enough for it.

If the minimum spec is a single machine there are really only two things that it would help with:

* The simulated content is cool to have but not critical to gameplay, such as VFX, and huge particle systems.
* OR the simulated content is critical to gameplay --- meaning the game is not a real-time simulator --- and you are trying to reduce processing time.

In the first case of it being non-critical to gameplay, about the only thing I can think of that would help are physics-driven particle systems.

In the second case, your simulation doesn't meet your description:

But, if I'm going for a target framerate of 30-60 fps, and maximizing the local CPU produces 10-15fps, offloading CPU work onto the network would free up some local CPU resources and give +X extra fps, even with the added network latency, possibly achieving the target framerate of 30-60fps.[/quote]

At that frame rate, no, it will not help.

That is not the point of the "OR" statement above.

That second point is a common feature for simulators of games like go and chess.


These games are effectively searching min/max trees, and the games are not solved. The simulators could easily consume multiple compute-days of CPU time and still be working on an ideal move. In that situation having a large network of CPUs (aka supercomputer) can be very helpful.
A game needs a minimum spec. If that minimum spec is a single machine, then fully functional and optimized means that it only needs a single machine to be fully functional. That is enough for it.

If the minimum spec is a single machine there are really only two things that it would help with:

* The simulated content is cool to have but not critical to gameplay, such as VFX, and huge particle systems.
* OR the simulated content is critical to gameplay --- meaning the game is not a real-time simulator --- and you are trying to reduce processing time.

In the first case of it being non-critical to gameplay, about the only thing I can think of that would help are physics-driven particle systems.
[/quote]

You'd have to build a GUI with a slider to adjust the game complexity (like how many units to allow simultaneously in a battle). The slider would be set to the best fit spec of the current machine by default so you could play the game on a single computer. The minimum requirement is that the game be highly playable on a single computer with the average hardware. If you wanted to ramp up the spec beyond the capacity for a single computer, you could add another computer to the cluster to offload some of the CPU workload. I suppose if you were really smart, you could do some load balancing calculations to figure out how much horsepower the networked computer has and what it can handle (holy added complexity, batman!). It would probably be a really hard sell in a professional studio.

The physics driven particle systems can probably be handled by a GPU these days so no need to offload that. I was thinking more of processing AI, or collision detection, or something that doesn't necessarily have to be done locally (like rendering or input). I've written code for clustered computing before (30 computers to compute mandlebrot sets) but the requirements for that are a bit different from running a real time game.

I guess the best way to find the answer is to just code it up and see how it works out.
Just an update for anyone who is interested: I've hit a major milestone and also realized that this problem is a lot more complex than I initially thought. Here's the approach I'm taking.

I decided to go with an entity-component model using message passing as a means to communicate between entities. I can then break my game up based on the different component groups. Some components obviously can't be migrated to a remote machine (like rendering and input components), so a flag needs to be set on a component pool.

So here is a brief overview of the classes and a description:
Entity: This is an instanced object in the game. It's attributes and behaviors are defined by the components which compose it.
EntityTemplate: I define templates of my objects in an XML file so that I can quickly instance them
Component: This is a generic class inherited by actual components
ComponentTemplate: Much like the entity template, but just templates of a component.
ComponentSystem: This is a generalized collection of a specific group of components
ComponentSystemPool: This is a collection of ComponentSystems
MessageRouter: There needs to be a way for component pools to communicate with other component pools, even if they are in a remote thread or remote host. This class is responsible for routing messages appropriately.

Example: I have a physics system, rendering system, targeting system, and input system. All the systems have a pool of their respective components. I want to divide the application in half, so I create two component system pools:
ComponentSystemPool1 contains: {PhyicsSystem, TargetingSystem}
ComponentSystemPool2 contains: {RenderingSystem, InputSystem}
Since the rendering and input have to remain local, they can't be migrated. So, physics and targeting get sent off to a different computer/thread.
Main computer runs: {ComponentSystemPool2}
Second computer runs: {ComponentSystemPool1}
The message router receives an application message from the main game update loop to advance to the next frame. The message router has to keep everyone synchronized, so it sends a message to all the component systems to advance to the next frame and requests a call back when they're done with it. Once a callback has been recieved from everyone, we're ready to go to the next frame. When the player wants to quit the game, the message router has to send a system message to quit the application on all the connected clients before letting the game end.

As a proof of concept, I have physics components and a rendering components defined, and contained within physics systems and rendering systems. The physics component pool is stored and run on a seperate thread from the main game. The rendering components have to use the messaging system to request entity position information from the physics component. The messaging system has to know where to find all of the component pools and then know how to send a message to it. Components don't care about where other components are located, they just want to send a message and get a reply as fast as possible. So, they send a message to their component manager who then figures out whether the message can stay local to the system pool, or if the message has to be sent up to the message router. The components will run spin locks until they receive a response (I'll probably have to think of a better solution since a thread could be waiting a while).

There's a bit of an unforeseen problem though: What happens to the game if a remote machine gets disconnected and that remote machine was processing something vital like physics and position information? The current thought I'm entertaining is to use a "hot spare" of a component system which is hosted on another machine. The hot spare mirrors the current component system each frame and waits for a failover event to happen. Since a hot spare is being maintained, it would improve performance and make sense to use duplicate data if it exists. "Read" messages could be thrown at the hot spare if the main pool is too busy -- but this also creates a new problem: What if the main instance gets a write message and the mirror hasn't been updated yet and receives a read request? The read result would be at least one frame behind, or worse, depending on latency. I guess this would ultimately need some pretty slick automated load balancing. It'd be nice to have an algorithm which can measure CPU and memory load, figure out if it would increase application performance to migrate a component system pool, and then do whatever is best.

Hopefully someone else gets inspired to try something like this and uses a similar approach :)

This topic is closed to new replies.

Advertisement