Sign in to follow this  

Utilizing Multiple core processors

This topic is 3482 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Has anyone here tried to utilize multiple cores for their games instead of the single core/processor approach? I was just curious to see if it was worth persuing and what advantages/disadvantages it might entail other than the obvious. =)

Share this post


Link to post
Share on other sites
Does it need anything else than the obvious humongous amount of extra processing power [looksaround]? Designing a good multi-threaded system is of course harder and debugging can be a bitch but other than that I don't see disadvantages.

Whether it's worth pursuing or not depends on your needs.

E: Oh and of course the number of cores will just keep going up (and speed of a single core will probably go down) in future so getting ready for it isn't exactly a bad choice.

[Edited by - virus- on May 27, 2008 5:32:08 PM]

Share this post


Link to post
Share on other sites
It is hard to really use multiple cores persistently throughout an application, because many applications (and games) are not easily parallelizable. For example simply using two threads to send your render commands to the graphics card not only won't run any faster, but it also won't work at all (never try that!).
However, some components can be moved to another thread (core), and that is certainly done by many people these days.

Quote:
what advantages/disadvantages it might entail
Threads make your program a lot more complicated, and threads on multiple cores even more so. You should have a good insight about how to synchronize threads and how to share/pass data between them before even thinking about using threads.
On the positive side, having two things run on two cores in parallel can obviously be quite a bit faster than running them one after the other. For example, you could calculate the physics for the next few frames while the main thread is still busy rendering the current one.
Also, threads can be wonderful to offload things that might otherwise stall your game in an unpleasant way (network, loading data, etc.).

Share this post


Link to post
Share on other sites
Quote:
Original post by murdock
I was thinking that multiple cores could provide advantages to improving performance in AI programming.
Short answer is yes it can. Lets assume that most of the AI works via scripts (which usually is the case). If the virtual machine supports multi-threading you can process X (number of cores) scripts at the same time, the amount of performance gained depends on how well the system is designed (how much synchronization you need).

Share this post


Link to post
Share on other sites
Quote:
Original post by murdock
Has anyone here tried to utilize multiple cores for their games instead of the single core/processor approach?
Yes.

Quote:
I was just curious to see if it was worth persuing ...

It can be. If you have something which is CPU bound, and if it can be suitably parallelized, then yes, it may be worth pursuing. If it isn't CPU bound, then it obviously isn't worth pursuing. If it is difficult or impossible to parallelize, then it is probably not worth pursuing.

Note that many libraries that you already use take advantage of multiprocessing if it is available. Direct3D will run many graphics operations in parallel. DirectSound will do a lot of mixing and nearly all playback asynchronous to your application. The OS will asynchronously process your network packets and prepare them for your use. Etc.

Also, there is a big difference between running different tasks in parallel (rendering thread, AI thread, networking thread, input thread ) and running a single task in parallel (AI partitioning the processing amongst all available processors). The former is generally easy, but doesn't scale well to more processors. The latter generally requires more effort, but can scale nicely.
Quote:
... and what advantages/disadvantages it might entail other than the obvious. =)

I don't know what you consider "obvious".

There is an additional cost for developing, debugging, and testing multiprocessing software. With properly experienced and educated developers and appropriate tools, the cost is fairly small. Without that experience and/or education, the development cost can be quite significant.

There is a computational penalty for keeping multiple processing elements synchronized and locked as necessary. This computational cost will be known and understood by people with the experience and education noted above.

Share this post


Link to post
Share on other sites



Another aspect might be 'can you keep your quad CPU fed with enough data to keep all 4 running efficiently'.

I have a Q6600 system that I havent had a chance yet to run tests of my expected CPU loads (HEAVY AI, but with a large data set -- so cant count on it staying in the 8meg cache much). Im not sure the memory I paid way too much for (PC9200) would even be worth the cost.

The big solution is clustering (which has even worse parallelization problems than multi-core) and I eventually aim to find out if dual cpus work almost as well as quads in clusters (and I suppose for the 8 core cpus that will be coming in the near future).





Share this post


Link to post
Share on other sites
Quote:
Original post by murdock
Has anyone here tried to utilize multiple cores for their games instead of the single core/processor approach?
I was just curious to see if it was worth persuing and what advantages/disadvantages it might entail other than the obvious. =)
In my hobby engine, I'm currently experimenting with a technique I call "deferred function calls" (there's probably an existing name for this pattern...) - where each function called on an entity is pushed into a queue (hopefully a lock-free queue) and processed on the *next* update-frame. This should allow me to update multiple entities at once (one per core) without the need to lock any of them.
Quote:
Original post by murdock
I was thinking that multiple cores could provide advantages to improving performance in AI programming.
Yes! AI code usually involves lots of 'searching' algorithms (pathfinders, planners, etc), and as fans of Herb Sutter know, searching algorithms are one of the few bits of code that can actually go superlinear - that is, if you double the amount of cores, you can actually get a performance boost larger than double!
Quote:
Original post by samoth
It is hard to really use multiple cores persistently throughout an application, because many applications (and games) are not easily parallelizable.
I'm always hearing people say that games are hard to parallelize, but the way I see it, games are usually made up of a simulation consisting of hundreds of individual entities. As long as each of those entities can be simulated in isolation, this sounds like the perfect situation for parallelization...

And anyway, even if you *can* only parallelize 10% of your application - let's say the particle simulation as an example - then as users keep doubling the amount of cores in their systems, you can at least keep doubling the amount of particles in your sim ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
I'm always hearing people say that games are hard to parallelize, but the way I see it, games are usually made up of a simulation consisting of hundreds of individual entities. As long as each of those entities can be simulated in isolation, this sounds like the perfect situation for parallelization...


The nature of simulation is that you tend to work with large or complete cross-sections of the data at very frequent intervals, eg. many times a second. This is generally the worst case for parallel programming since the typical parallel algorithm accepts some extra processing latency in return for greatly enhanced processing bandwidth. Unfortunately it's no good having your game run at a solid 75fps if you don't start turning the view until 20 frames after you started moving your mouse.

Plus, most of the time in games the only entities that can be simulated in isolation are the ones not affecting very much, and which you were already optimising away anyway.

So generally, simulation-style games are not all that easy to gain parallel benefits from. It can be done of course, but you have to change your algorithms quite a bit more than you would in many business or analytical applications.

[Edited by - Kylotan on May 28, 2008 7:59:31 AM]

Share this post


Link to post
Share on other sites
What about if you did lock your threads at the end of each frame? I know its not ideal because you won't utilize all the processing power available but wouldn't you still gain an advantage of processing some tasks in parallel such as CPU skinning , AI and rendering? Then at the end of the frame you wait until all threads have finished before continuing the next? Is this going to cause problems on single core CPUs?

I don't know that much about this so please correct me if i'm wrong or missing something because I was considering doing it this way.

Share this post


Link to post
Share on other sites
Paralellizing over n cores is somewhat trivial, as long as your problem can be distributed in some way.

The real challenge is using arbitrary number of cores effectively (1..n).

There are several currently accessible approaches to scaling applications. Each has its own characteristics, and neither is universal.

Share this post


Link to post
Share on other sites
Most of the benefits are in other areas than the simulation.

You cannot have aspects of your core gameplay be changed based on the cpu load available. It should be obvious, but perhaps it isn't. The exploits available for that are extreme. Imagine something like "If I start running a distributed.net client at high priority before starting the game, it is very easy to win" kind of issues. Or conversely, somebody on an dual quad-core or quad-quad machine will find the game impossible.

So what's left?

Basically you are left with the fluff processing.

For graphics, that might mean allowing a few bajillion animated particles. Most games also include options of allowing high/low resolutions, FSAA, different screen modes, and so on.

There are other quality items, maybe a few audio quality options, but you can't consume an arbitrary number of processors with that.

Maybe if you have animated spectators or something in your game, have each one run their own AI if processing space is available. But that doesn't add much to the game, and your time will be better spent adding new game modes or gameplay elements.

Share this post


Link to post
Share on other sites
Quote:
Original post by jrmcv
What about if you did lock your threads at the end of each frame? I know its not ideal because you won't utilize all the processing power available but wouldn't you still gain an advantage of processing some tasks in parallel such as CPU skinning , AI and rendering? Then at the end of the frame you wait until all threads have finished before continuing the next? Is this going to cause problems on single core CPUs?

I don't know that much about this so please correct me if i'm wrong or missing something because I was considering doing it this way.


This mostly talks about DX, but I reckon it applies to most graphics APIs

It does depends on the situation, but synchronizing all threads at the end of a frame may be too tricky to be useful. Checking whether or not all threads are ready might not be trivial and even if it is, it'll introduce at least some overhead. As I recall this huge pack of presentations (from gamefest) contains a nice presentation about multi-core programming and synchronization, with the priceless title Designing Multi-Core Games: How to Walk and Chew Bubblegum at the Same Time [smile]

Another thing you'll have to consider is the GPU, which is running in parallel with your CPU and thus with your threads. This excellent FAQ covers the CPU/GPU synchronization in more detail, but what I'm trying to say is that it can already be challenging to get this synchronization to work right (the typical CPU/GPU bound problem), even without also having to sync up multiple threads on the CPU side.

If you don't absolutely need to distribute your core code over multiple processors to get acceptable performance, it's probably best to use extra threads to handle stuff that doesn't need rigid synchronization. This can include tasks like pathfinding or long-term AI strategies (RTS planning). Other prime candidates to run in a seperate thread are fire-and-forget things, like audio decompression/playback.


Quote:
Basically you are left with the fluff processing.


It's probably true that this fluff stuff is easiest to distribute over multiple cores, but sooner or later we'll have to cross over to multithreading (that is, if you really need the processing power). With the clockspeeds being pretty much stagnant and the number of cores steadily increasing, it does seem to be the next big step... Or rather that's what them papers have been telling me [wink]

Share this post


Link to post
Share on other sites
You can set a thread's processor affinity on Windows, but it's generally recommended you let the OS do the scheduling. On the XBox360 however, you'll actually have to set on which processor a thread runs manually. This interesting presentation has more info, be sure to check the comments.

Share this post


Link to post
Share on other sites
Quote:
Original post by jrmcv
If you don't absolutely need to distribute your core code over multiple processors to get acceptable performance, it's probably best to use extra threads to handle stuff that doesn't need rigid synchronization. This can include tasks like pathfinding or long-term AI strategies (RTS planning).
But then, as I mentioned, you are changing core gameplay based on available processing power. That is a well known way to exploit games, and all professional shops should account for it.

Imagine the situation where I install the game five years from now, and the once-enjoyable RTS now has developed unbeatable long-term AI strategies thanks to the extra processing power.

Or imagine that I'm on a "budget" computer that barely meets the specs. The AI that other people say is wonderful seems to be complete garbage on my machine.

You must be very careful when trying to give extra cycles to gameplay processing.

It gets even more complex when you have multiple players over a network. There have been many well-known exploits over the years where a player could hide inside a wall or land on errant geometry if their CPU load is correct.
Quote:

It's probably true that this fluff stuff is easiest to distribute over multiple cores, but sooner or later we'll have to cross over to multithreading (that is, if you really need the processing power). With the clockspeeds being pretty much stagnant and the number of cores steadily increasing, it does seem to be the next big step... Or rather that's what them papers have been telling me [wink]
There is a difference between mandatory for gameplay processing, and using up all unused cycles with extra gameplay processing. As for multithreading, any seriously large game already does this.

Look at new releases like "Age of Conan". It requires 3GHz or equivalent. For most of us, that would either be a dual or quad processor. That value will continue to go up.

That's an example of a game just increasing the minimum bar, not of a game filling the extra cycles with bonus gameplay processing.

Share this post


Link to post
Share on other sites
Thanks, I think understand - Going to look through those docs before going any further!

EDIT: sorry for the multiple posts, I had problems replying.

[Edited by - jrmcv on May 28, 2008 4:55:21 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by frob
But then, as I mentioned, you are changing core gameplay based on available processing power. That is a well known way to exploit games, and all professional shops should account for it.

Imagine the situation where I install the game five years from now, and the once-enjoyable RTS now has developed unbeatable long-term AI strategies thanks to the extra processing power.


I agree with most of your post, but I think you may be reading too much into mine. If you have a 2nd core available sitting there idle (which 85% of gamer PCs are reported to have), you could offload AI stuff to it and reduce the load on the main (rendering) core. That doesn't automatically mean PCs with more cores get more AI calculations and thus more advanced AI.

Consider a simple chess example. If a user selects beginner difficulty, the game could still be limited to some max number of turns to look ahead, no matter how many cores are available, now or in 5 years. The same could be implemented for a minimum number of turns to look ahead, to ensure a consistent difficulty. This may raise the min system spec, but the point is that it can be made consistent, regardless of processing power or load.

Multiplayer is a different matter entirely and I think we can agree that a client, let alone its CPU load, should have no authority over the game state.

Quote:
That's an example of a game just increasing the minimum bar, not of a game filling the extra cycles with bonus gameplay processing.


Ok, so we agree that the bar has already been raised to multi-core for non-fluff stuff (?)

Share this post


Link to post
Share on other sites
Quote:
Original post by remigius
I agree with most of your post, but I think you may be reading too much into mine. If you have a 2nd core available sitting there idle (which 85% of gamer PCs are reported to have), you could offload AI stuff to it and reduce the load on the main (rendering) core.
That is an implementation detail. Also, those "85% of gamers" (citation needed) should be careful to note the difference between idle and very low load.

If you throw a simple multithreaded game on an overclocked QX9770, even if the game is using all four processes, it could appear that all the processors are "idle". But if you threw that same application at a 1.2GHz P4, it could kick the processor up to 100% as seen on task manager.

Quote:
That doesn't automatically mean PCs with more cores get more AI calculations and thus more advanced AI.

Consider a simple chess example. If a user selects beginner difficulty, the game could still be limited to some max number of turns to look ahead, no matter how many cores are available, now or in 5 years. The same could be implemented for a minimum number of turns to look ahead, to ensure a consistent difficulty. This may raise the min system spec, but the point is that it can be made consistent, regardless of processing power or load.
That is exactly the point.

Left uncontrolled, this is exactly what does happen if you allow all available processing power to AI. Similar effects are seen for any other world processing. There are many good games that were fun to play on old hardware that are unplayable on today's faster hardware. My first experience with that was back with a 16 MHz processor breaking features of games written for 12MHz machines.

If all extra processing power is fed into gameplay processing then bad things happen. Core gameplay needs to be limited to a common rate.

Quote:
Ok, so we agree that the bar has already been raised to multi-core for non-fluff stuff (?)
That has been done for quite some time now.

Most major titles will take advantage of multiple CPUs. This isn't saying that they will max out all 16 processors of a quad/quad box, but if you have a single quad-core, they should have at least some work to do.

It is rare to find a modern AAA title which constantly pegs a single CPU at 100% and leaves the others completely idle.

That being said, some of them will fill a CPU up to 100% with extra fluff processing. In this case, though, the other processors are not completely idle. They've done the main work, and are now bored.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kylotan
The nature of simulation is that you tend to work with large or complete cross-sections of the data at very frequent intervals, eg. many times a second. This is generally the worst case for parallel programming since the typical parallel algorithm accepts some extra processing latency in return for greatly enhanced processing bandwidth. Unfortunately it's no good having your game run at a solid 75fps if you don't start turning the view until 20 frames after you started moving your mouse.
My prototype framework, and even the basic ones I've seen from MS add in one frame of lag.
However, you can simply run the update cycle at a higher frequency than the render cycle, seeing as by developing a thread-safe framework you've already got a mechanism to keep around a read-only cache of the game-state for the renderer to use anyway.
In the end, lag ends up pretty similar to what we're used to with our traditional game frame-works.
Quote:
Plus, most of the time in games the only entities that can be simulated in isolation are the ones not affecting very much, and which you were already optimising away anyway.

I disagree; case in point being Havok or AGEIA/NVidia's PhysX - even physics (where there can be a lot of interaction) is becoming massively parallel now.

Any game-entity can be become parallel if writes are done to the next state while reads are done from the current state (this does require a new though-process while coding). The overhead of maintaining redundant states is mitigated by the fact that as well as enabling parallelism, it also simplifies replays, networking and dead reckoning.

CPUs aren't getting any faster, but Moore's law still holds so instead they are getting more parallel. If we want to keep up with the performance of these new CPUs, we've got to change the way we go about writing our code.

Share this post


Link to post
Share on other sites
Quote:
Original post by Hodgman
Any game-entity can be become parallel if writes are done to the next state while reads are done from the current state (this does require a new though-process while coding). The overhead of maintaining redundant states is mitigated by the fact that as well as enabling parallelism, it also simplifies replays, networking and dead reckoning.
This has been the normal way of doing things for many years. Consider the PS2 (eight years ago) architecture has multiple processing units with a pipeline that is a few rendered frames long. You're always working at least one frame in the future, possibly four frames if your game engine was architected poorly, and the resulting input lag was one of the original complaints of the system.

Share this post


Link to post
Share on other sites
Quote:
Original post by frob
Quote:
Original post by Hodgman
Any game-entity can be become parallel if writes are done to the next state while reads are done from the current state
This has been the normal way of doing things for many years.
If it's normal now, then that means that people can stop saying games are hard to parallelize, or if it is hard, then hard is the norm ;)

From what I can tell, one frame (20ms) of input lag is hardly a problem (4 would be a problem!).
The only input lag I can't stand is when people use Direct Input for the mouse!

Share this post


Link to post
Share on other sites
Quote:
Original post by frob
That is an implementation detail. Also, those "85% of gamers" (citation needed) should be careful to note the difference between idle and very low load.


To clear this up, I meant 85% of gaming PCs are reported to be multi-core, as cited from the MS presentation I linked to. Obviously nothing can be guaranteed on the idle part, but you're really arguing semantics here. In a typical (non-exploit) scenario, a game will be the sole foreground process and have additional cores available to it that are running on a very low load.

I'd like to discuss the rest further, but I'm unsure what you're trying to say. I was responding to your 2nd post in which you said that we're only left with fluff processing for multi-threading, yet you're telling us now that parallelizing main processing has been the norm for over 8 years already. I think overall we're in agreement and are happily misreading eachothers points [smile]

Share this post


Link to post
Share on other sites
[quote]Original post by Hodgman
Quote:
Original post by murdock
Has anyone here tried to utilize multiple cores for their games instead of the single core/processor approach?
I was just curious to see if it was worth persuing and what advantages/disadvantages it might entail other than the obvious. =)


In my hobby engine, I'm currently experimenting with a technique I call "deferred function calls" (there's probably an existing name for this pattern...) - where each function called on an entity is pushed into a queue (hopefully a lock-free queue) and processed on the *next* update-frame. This should allow me to update multiple entities at once (one per core) without the need to lock any of them.[quote]


You might have trouble making it 'lockless' if there are multiple generators (inserting into queue) and/or multiple consumers (removing from queue).

The Lockless Queue mechanism Im familiar with only works if there is only one generator and one consumer.

Share this post


Link to post
Share on other sites

This topic is 3482 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this