Sign in to follow this  
FPGA Jim

Accelerating PC-based gaming?

Recommended Posts

FPGA Jim    122
Hi - I'm a researcher investigating new ways of accelerating computations. Gaming sounds like a nice killer app ref real-time computations. I know there is some published work to date on this topic of hardware game acceleration and commercially available devices. My question is, in the future would PC-based gamers be willing to install a single plug-in card that would support current graphics/physics/AI etc computing. The idea is to investigate new hardware technology which can support such demands and can be updated without replacing the plug-in card? Do you think this would something worth pursing? Any comments good or bad welcome. Thanks.

Share this post


Link to post
Share on other sites
snk_kid    1312
Quote:
Original post by FPGA Jim
My question is, in the future would PC-based gamers be willing to install a single plug-in card that would support current graphics/physics/AI etc computing.


This is already being done with GPUs for more general purpose computations, check GPGPU.

Quote:

The idea is to investigate new hardware technology which can support such demands and can be updated without replacing the plug-in card?


Not sure what you mean by that.

Quote:
Original post by FPGA Jim
Any comments good or bad welcome.


Before throwing more powerful/faster hardware at the solution start investigating data structures & algorithms with better time/space complexities, profile code, parallelize code (exploit multi-cores, GPUs, and/or SIMD instructions), make it cache friendly, utzile memory pools, etc, etc.

Share this post


Link to post
Share on other sites
frob    44904
Quote:
Original post by FPGA Jim
My question is, in the future would PC-based gamers be willing to install a single plug-in card that would support current graphics/physics/AI etc computing.

Nvidia and ATI make cards like that for graphics.
The Ageia PhysX Physics Card handles physics, but isn't very popular.
AI is generally an abstract concept that doesn't lend itself to a dedicated card. Figuring out what action to take in a given situation is often done by a few simple state machines that take relatively little processing power.

Quote:
The idea is to investigate new hardware technology which can support such demands and can be updated without replacing the plug-in card?
What do you mean? You want to buy a video card along the lines of an FX5200 and have it perform like an 8800-GTX?

Hopefully you understand that this just isn't physically possible with hardware. It would be like installing a "hardware accelerator" software that would make your old 486 run like a new quad-core machine. Or like asking if there was something you could put in your car that could make your Honda Accord drive like a Lamborghini Reventón. Such a thing just isn't possible.

To a limited extent you can use FPGAs as rewritable hardware, but they are slower and more expensive than traditionally manufactured mass-produced processors.

Share this post


Link to post
Share on other sites
Kylotan    9854
Quote:
Original post by FPGA Jim
My question is, in the future would PC-based gamers be willing to install a single plug-in card that would support current graphics/physics/AI etc computing. The idea is to investigate new hardware technology which can support such demands and can be updated without replacing the plug-in card?


As you're probably aware, the more specialised the feature, the faster it can be, and the more flexible the feature, the slower it has to be. Early graphics cards were only faster than CPU rendering because there was a small and fixed set of operations that were implemented directly in hardware and heavily pipelined, trading latency for bandwidth. So if you wanted to create hardware that actually has reasonable benefits, you'd have to keep the feature set limited to a core set of efficient primitive operations.

The problem with generalising that sort of model to other areas is that the type of computations being performed is often completely different. Physics is already being addressed, as it's quite mathematical and almost as amenable to acceleration as graphics. AI is nowhere near mature enough to be accelerated by hardware as there are no standard approaches, never mind standard computations.

In theory there could be some sort of gain by allowing the installation of extra general purpose processors which run in parallel with the CPU. However in practice games are not typically coded in a way that is conducive to concurrent processing (eg. the requirement for consistent global state many times per second for update purposes), meaning this would probably be hard to take advantage of.

Share this post


Link to post
Share on other sites
FPGA Jim    122
Thanks for feedback - some very good points.

I was thinking of using a new type of reconfigurable hardware which now has high bandwidth access to the CPU (which in the past was one of the drawbacks with this type of hardware and desktop computing). The concept would look at time multiplexing the hardware to accommodate real-time computing demands. This is speculative at the moment but may provide a platform that can support future gaming requirements. I understand AI within the context of gaming is limited to FSM but this will possible change. Like any platform I do understand that tools would need to be available to make such hardware abstract to programmers etc. This concept is something that I'm investigating - I'm just considering whether gaming is a good application domain.

Your feedback was most welcome.

Share this post


Link to post
Share on other sites
Kylotan    9854
It's not so much that AI is limited to FSM, but that all the more complex approaches are so diverse that there seems to be little opportunity to standardise, never mind optimise. Low level mathematical concepts such as neural networks would be a lot more amenable to hardware optimisation than high level symbolic stuff like declarative languages and so on, but they're not at all equivalent. So I still feel that physics and graphics are pretty much the only things that are practical to hardware accelerate at the moment.

There's one exception that comes to mind though; it might be possible to do pathfinding/graph traversal effectively, since the inputs and outputs are generally small and the operations involved are often - but not always - trivial.

In fact, there are quite a few AI approaches that can be optimised in similar ways, but if only 0.5% - 1% of games use any given approach, is it worth the effort?

Share this post


Link to post
Share on other sites
FPGA Jim    122
Intel has recently announced an agreement with a leading FPGA manufacturer to provide access to their FSB bus. This opens new opportunities for desktop computing as devices such as FPGAs can now be accessed at significantly greater speeds. Traditionally the PCI/e/x buses have limited the bandwidth between programmable and reconfigurable FPGA devices. However the new increased CPU/FPGAs bandwidth via the FSB will allow more data-oriented applications to exploit the hardware acceleration of FPGAs.

One possible application domain is gaming were several computations could be run in parallel with the CPU. This is similar to multi-core CPU devices although the difference is FPGA hardware can provide higher speedup than a software equivalent running on a core. Again, the big challenge is developing tools to support such a paradigm as programmers do not want to design custom hardware.

Any comments on whether this type of opportunity may interest developers in the gaming community?

Share this post


Link to post
Share on other sites
Kylotan    9854
I really don't think games are a great application for this. There's hardly anything you can effectively run in parallel with the CPU when your system needs to be fully synchronised 60 times a second. Our software just doesn't (currently) suit parallel models. It's all well and good having an FPGA be faster than a 2nd core, but most games don't even know how to effectively use a 2nd core yet.

Share this post


Link to post
Share on other sites
trasseltass    123
How about aiming for real-time raytracing? It's not really a gaming solution thou, rather a graphics/rendering solution. RTRT requires a lot of computing power and relies heavily on algebra operations, solving equations for intersecting primitive shapes, kd-tree traversal (depending on solution of course), etc. Operations may be split on several cores.

There are a number of problems however.. For one, all cores and the user application probably would require random access to the same set of primitives, for non-static scenes.

I'm no HW wiz, this is just a thought I had since I believe RT and such rendering techniques will become widely used in the future (we have to stop with the cheating some time, eh? ;)

[Edited by - trasseltass on October 15, 2007 7:49:49 AM]

Share this post


Link to post
Share on other sites
Oxyd    1157
Quote:
Original post by Kylotan
I really don't think games are a great application for this. There's hardly anything you can effectively run in parallel with the CPU when your system needs to be fully synchronised 60 times a second. Our software just doesn't (currently) suit parallel models.


Forgive my ignorace as I’ve got very little experience as far as game programming goes, but why would the system need to be fully synchronised sixty times a second? I mean – one core can do all the game-related stuff like updating the world while the second core would only render the data the first core computed, sixty times per second... Certain level of atomicity would still need to be preserved of course, but still I don’t see why full synchronisation would be needed...

Is this possible but not yet done, or are there any significant reasons not to do this?

Share this post


Link to post
Share on other sites
nobodynews    3126
That probably can't be done without two sets of data, like some type of front and back buffer. While the renderer draws the front buffer the physics works on the back buffer. If there were only one set of data then you would have situations where the renderer was waiting for the physics to finish... as in all the time, because until everything is calculated you probably won't be able to render it correctly.

That makes the most sense to me, but there could be better reasons for it.

Share this post


Link to post
Share on other sites
Promit    13246
Quote:
Original post by FPGA Jim
I was thinking of using a new type of reconfigurable hardware which now has high bandwidth access to the CPU (which in the past was one of the drawbacks with this type of hardware and desktop computing).
High bandwidth isn't particularly important. Any modern hardware bus technology will supply you with enough bits. The problem is one of latency. You need to get the results back very, very quickly. You can afford to slide by a frame or two, but no more than that. In other words, the results need to be available within 20ms or so of submission for computation.

Share this post


Link to post
Share on other sites
Kylotan    9854
Quote:
Original post by Oxyd
Forgive my ignorace as I’ve got very little experience as far as game programming goes, but why would the system need to be fully synchronised sixty times a second?


Things don't necessarily need to be, but that's pretty much the way that everything works. It's the tried and tested method that developers are familiar with, and it was optimal for pretty much every platform until relatively recently.

Quote:
I mean – one core can do all the game-related stuff like updating the world while the second core would only render the data the first core computed, sixty times per second... Certain level of atomicity would still need to be preserved of course, but still I don’t see why full synchronisation would be needed...


Where's this dividing line between 'certain level of atomicity' and 'full synchronisation'? Which part of the data can I choose not to lock when going through the world and deciding what to add into the render queue? Do I lock the whole thing, holding up the other thread? Or do I lock and unlock individual items or areas, incurring massive overheads? (Locking is not cheap.)

As it stands, rendering isn't the problem - most that already occurs in parallel, on the GPU. What remains are things like AI and physics, which tend to operate on a complete game world, or a large area of it, and require consistency to operate. There are documented ways around this, especially for AI, but AI is non-trivial and non-standard, which means there is still some way to go.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this