• Advertisement
Sign in to follow this  

Discussion about console multi-processor programming

This topic is 4687 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Well I wonder how prepared the game industry is for programming games with dual, triple, or quadruple processors/cores (ie. XBox2/PS3, respectively). I mean the first dual processor system was the Saturn I believe and from articles I've read programmers had a very hard time programming for the Saturn, which is why alot of games were made on the PS1 (speculation, of course). Then the PS2 came with its Emotion Engine CPU and dual vector units and programmers are still bitching about programming for those 3 chips. I doubt (based on no real information) that the full potential of the Vector Units have been realized as far as PS2 graphics programming goes. Now we're talking about 4 to 8 cores for PS3 and 3 PowerPC processors for Xbox2. Will programmers be able to utilize efficiently and effectively these multi-processors/cores? What makes this time around any different than before? sidenote: will the PS3 have Vector Units as well? or were those a gimmick? [Edited by - Alpha_ProgDes on April 27, 2005 1:29:25 AM]

Share this post


Link to post
Share on other sites
Advertisement
You'd be right to doubt that the full potential of the VUs on the PS2 has been realized - they're a bitch to program and the architecture of the console makes it difficult to get full value out of them. Most games are a long way from getting the most out of them.

What makes this time around different from before is that processor manufacturing processes have pretty much reached the limit (at least for the time being) as far as increasing the performance of a single core goes. The benefits of increasing transistor count in order to increase cache, branch prediction unit complexity, execution units, instruction reordering and other tricks to increase single processor performance have been exhausted. The only real option for increasing performance is to use all those transistors for extra cores. It's going to be difficult for programmers to make use of the extra cores but we're not going to be able to rely on performance increases elsewhere so we'll just have to learn to deal.

Share this post


Link to post
Share on other sites
I find it amusing that it's now going back this way? Why?

The arcade machines of the early 80s, often used several CPUs, because of the machines were made around the games, not the other way around.

So if a game maker couldn't get enough performance out of one CPU, they'd simply get the hardware designers to add another. It's not uncommon to find machines from that vintage with 2, 3 or even 4 CPUs.

But they weren't true SMP. In some cases they didn't share any memory and had relatively slow communications. Rather, they used them for different tasks (one for graphics, one for sound, another for input).

I don't know whether this was just out of lazyness on the designers' parts, but more likely because the tools available then for development (of large assembly language programs) were a lot more primitive than they are now.

Mark

Share this post


Link to post
Share on other sites
I'd be interested to hear about some of the approaches people have taken to making their own games multithreaded/multiprocessor-enhanced. I know many people ignore it and believe it's not a good thing to persue, but with hyperthreading and dual core CPUs knocking about it can really give you a performance benefit.

Myself, I've tried implementing the approach used by the game Perimeter (discussed at EDF 2004). One thread is responsible for rendering, and runs as fast as it can (spends most of it's time either in driver calls or blocking on the GPU). The main thread, responsible for game state updates, is kicked off by a periodic timer (around every 10ms, I think) so I get all the advantages of a fixed timestep. That main thread works to keep two complete game states around; one from before the current time, and one from after, seperated by one timestep (in fact there are three to avoid locks). When gathering data for draw calls, the renderer then interpolates between the two depending on the current time. So there may be several frames elapsing between gamestate updates.

It's worked pretty well for my spinning triangle, anyway.

Share this post


Link to post
Share on other sites
Quote:
Original post by Alpha_ProgDes
Will programmers be able to utilize efficiently and effectively these multi-processors/cores?


To start with in the majority of cases the answer is going to be no. Like any new technology it takes a while for programming practises and technics to catch up and then begin to push it (example; GPUs).

Ofcourse, some of the more clever developers/middleware developers would have known this is coming and began to adapt their code to a more multicore friendly setup, so while there is still going to be a learning curve they will be already slightly ahead of the game.

Share this post


Link to post
Share on other sites
Quote:
Original post by superpig
Myself, I've tried implementing the approach used by the game Perimeter (discussed at EDF 2004). One thread is responsible for rendering, and runs as fast as it can (spends most of it's time either in driver calls or blocking on the GPU). The main thread, responsible for game state updates, is kicked off by a periodic timer (around every 10ms, I think) so I get all the advantages of a fixed timestep. That main thread works to keep two complete game states around; one from before the current time, and one from after, seperated by one timestep (in fact there are three to avoid locks). When gathering data for draw calls, the renderer then interpolates between the two depending on the current time. So there may be several frames elapsing between gamestate updates.


hmmmm I like that idea, as you point out the fixed time step makes alot of things easy to deal with (physics much prefers it) but on the other end you can throw out frames as fast as possible. Decouple input from that so the game feels responsive (one of the key gripes most people have with stuff like Quake based games where input is glued to framerate) and it looks like a winner. If I get bored I might have a poke at this kind of thing.

Quote:

It's worked pretty well for my spinning triangle, anyway.


triangles, pfft! try a spinning cube! [grin]

Share this post


Link to post
Share on other sites
Quote:
Original post by superpig
That main thread works to keep two complete game states around; one from before the current time...


Wow impressive, keeping a copy of the state from the future. How does it work? Does your Mac have a FPU (future prediction unit) fitted as standard? :)

Mark

Share this post


Link to post
Share on other sites
i'm hoping we don't see any new systems in 2007(-8), because there's no way they would have been able to push the hardware by then. In any case, hopefully all this hardware will force companies to take a DS approach and actually develop GAMES and not "this year's consoles that's 3 steps away from a mini-super-computer."

Share this post


Link to post
Share on other sites
The scary thing is that the memory busses don't keep up with the CPU speed increase. And now, there's two (or three, or four) CPUs, each of which is ravenously consuming memory bandwidth. And the frame buffer and textures share a unified memory controller with these CPUs.

The good thing is that modern CPUs have larger caches, so it's possible to write cache-local subsystems, so you can actually get some work out of each CPU. But I predict we'll see coarse-scale multi-threading, where each thread is tied (with affinity) to a specific CPU, and each thread has a specific job (physics, audio, rendering, etc).

Share this post


Link to post
Share on other sites
Quote:
Original post by markr
Quote:
Original post by superpig
That main thread works to keep two complete game states around; one from before the current time...


Wow impressive, keeping a copy of the state from the future. How does it work? Does your Mac have a FPU (future prediction unit) fitted as standard? :)


One of the nicest aspects of the fixed timestep is that many things become deterministic [smile]

_the_phantom_, yeah, I think the way to go on the input front is to use a buffering system, much like the one DirectInput provides. If you're not using DirectInput, you presumably run a seperate thread that records all input events (or just polls the hardware) recording things into a buffer. Next time the fixed timestep updater rolls around, it can just take the latest buffer and apply it to the system.

Share this post


Link to post
Share on other sites
A question about your program. Will multi-processors immediately pick up the individual threads or do you have to tweak it a little to make sure that the each thread goes a separate processor.

Show some code too [smile]

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603
The scary thing is that the memory busses don't keep up with the CPU speed increase. And now, there's two (or three, or four) CPUs, each of which is ravenously consuming memory bandwidth. And the frame buffer and textures share a unified memory controller with these CPUs.

The good thing is that modern CPUs have larger caches, so it's possible to write cache-local subsystems, so you can actually get some work out of each CPU. But I predict we'll see coarse-scale multi-threading, where each thread is tied (with affinity) to a specific CPU, and each thread has a specific job (physics, audio, rendering, etc).


isn't that why we have Rambus and DDR2 memory? maybe i'm getting my concepts confused, but i thought those were to take care of memory bus speeds.

Share this post


Link to post
Share on other sites
Quote:
Original post by superpig
_the_phantom_, yeah, I think the way to go on the input front is to use a buffering system, much like the one DirectInput provides. If you're not using DirectInput, you presumably run a seperate thread that records all input events (or just polls the hardware) recording things into a buffer. Next time the fixed timestep updater rolls around, it can just take the latest buffer and apply it to the system.


yeah, that would make sense as a way forward. Once I'd got my windowing stuff sorted I was considering looking into building a stand alone input library (something which is often brought up as lacking) and this system is certainly one to consider.

@Alpha_ProgDes
With an OS such as Windows when you start a thread it will automatically handle the details of when to run it and where. I belive generally it uses a load balancing system and tries to keep things spread evenly between the cpus, however you can ask it to keep one thread on one CPU (the affinity that hplus0603 was talking about) which depending on the task being carried out can improve performance.

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603
The scary thing is that the memory busses don't keep up with the CPU speed increase. And now, there's two (or three, or four) CPUs, each of which is ravenously consuming memory bandwidth. And the frame buffer and textures share a unified memory controller with these CPUs.

At least for one of the systems the frame buffer won't be sharing that memory bus.

Share this post


Link to post
Share on other sites
Maybe I'm missing something, but doesn't the interpolation based fixed time step system mean that all input isn't shown to the player until the second time step after the input?

Say you have your two states already calculated, the present and future state. As we interpolate between these two states all of the input being recorded is not being reacted to because the future state is already computed. Once we get to the future state, it becomes the current state, and we calculate a new future state. This future state takes into account all of the buffered input, but isn't displayed right away. We interpolate to this future state, and once we reach it then the input from a timestep ago is finally displayed.

That would mean an average input lag of 1.5 time steps, or for a 10 ms timestep 15 ms. The lag would be between 10 and 20 ms. Is that not much of a worry?

Share this post


Link to post
Share on other sites
Quote:
Original post by intrest86
Maybe I'm missing something, but doesn't the interpolation based fixed time step system mean that all input isn't shown to the player until the second time step after the input?

Say you have your two states already calculated, the present and future state. As we interpolate between these two states all of the input being recorded is not being reacted to because the future state is already computed. Once we get to the future state, it becomes the current state, and we calculate a new future state. This future state takes into account all of the buffered input, but isn't displayed right away. We interpolate to this future state, and once we reach it then the input from a timestep ago is finally displayed.

That would mean an average input lag of 1.5 time steps, or for a 10 ms timestep 15 ms. The lag would be between 10 and 20 ms. Is that not much of a worry?


Yes, which is why I'd probably eventually move to a smaller timestep. The alternative is to recalculate the step if you know that you recieved input since you previously calculated it, but there's a risk of 'jumping.' I guess it worked particularly well in Perimeter because it's an RTS - a split-second delay in selecting a unit or beginning a move order is unlikely to be noticed. An FPS might be more problematic - I've not tried it.

Share this post


Link to post
Share on other sites
How about implementing a platform on the console similar to .net/java where it's multithreaded behind the scenes; ie java's AWT event thread and .net's equivalent. There'd be plenty of cycles available for an input thread, physics thread, an ai thread etc... and the team wouldn't have to code for them specifically, they'd just be there?

Share this post


Link to post
Share on other sites
You can't just take multiprocessing and "throw it into the framework." You need to design for it - account for synchronization, effective load balancing and so on - or at least, you do if you want to get decent performance out of it. dotNET's attribute system makes it fairly simple to cook up a [parallelizable] attribute and apply that to a loop or something (so for i = 1 to 100, 1 to 50 go on one processor and 51 to 100 on the other), but it still requires that you actually apply that and that the stuff you're applying it to can operate in parallel.

Sheesh, "they'll just be there," are you lazy or what? [razz] [wink]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement