#1 Members - Reputation: 205
Posted 04 August 2005 - 12:55 PM
#3 Members - Reputation: 530
Posted 04 August 2005 - 02:48 PM
Off hand, my guess is that your first difficulty would be handling random access needed when exploring neighbors in a search algorithm. Graphics cards are more designed for algorithms which take small, predetermined data sets and work in parallel.
#4 Anonymous Poster_Anonymous Poster_* Guests - Reputation:
Posted 04 August 2005 - 02:58 PM
http://graphics.cs.uiuc.edu/~jch/papers/gpucluster.pdf
A paper where they run the k means clustering algorithm on a gpu and get 1.5x - 3x speed up. It's pretty cool but then I showed him this:
http://www.cs.cmu.edu/~dpelleg/download/kmeans.pdf
A paper where through clever data structures they speed up the same algorithm 170x
An easier way to take advantage of hardware acceleration in AI type algorithms is to use matrix math instead of loops where ever possible, then use a hardware accelerated matrix library (like a machine specific BLAS library). Matlab programmers are really good at this and call it "vectorizing" an algorithm. It also often times produces really elegant code. I'm not sure about A*, but a lot of neural network operations, for example, can be written in terms of matrix multiplications.
#5 Anonymous Poster_Anonymous Poster_* Guests - Reputation:
Posted 04 August 2005 - 03:03 PM
#6 GDNet+ - Reputation: 1412
Posted 04 August 2005 - 03:29 PM
#7 Members - Reputation: 834
Posted 04 August 2005 - 04:03 PM
Quote:
Original post by Foxostro
The problem is that its expensive to get data off the graphics card, no matter how easy it is to put it on the card or do the operations. I don't think it will be practical, but I have no facts to back this statement up.
I don't know if that will be a huge problem. An A* algorithm only needs to return the completed path to be useful, and you can probably store the path in something like 500 bytes. An enormously complicated neural network only needs to return the values of the final layer. In general, the data that represents "what action should the AI take" is going to be pretty small. So as long as it's cheap for the GPU to send us back 1k or so of data, we'll be alright.
#8 Members - Reputation: 172
Posted 04 August 2005 - 04:14 PM
Quote:
Original post by pinacolada Quote:
Original post by Foxostro
The problem is that its expensive to get data off the graphics card, no matter how easy it is to put it on the card or do the operations. I don't think it will be practical, but I have no facts to back this statement up.
I don't know if that will be a huge problem. An A* algorithm only needs to return the completed path to be useful, and you can probably store the path in something like 500 bytes. An enormously complicated neural network only needs to return the values of the final layer. In general, the data that represents "what action should the AI take" is going to be pretty small. So as long as it's cheap for the GPU to send us back 1k or so of data, we'll be alright.
Alright, nevermind then. I hope this idea pans out! :)
#9 Senior Moderators - Reputation: 1744
Posted 04 August 2005 - 04:39 PM
IMHO, the "Stupid GPU Tricks" fad is one that is fading, fast. The bottom line is that a graphics card is designed for graphics. It's more programmable than it used to be, but the transistor layout is still one that's optimized for the rasterization pipeline and not for other things. The need for generalized parallel processing hardware on consumer-level computers is clear, but the GPU is not the chip that will fill this niche. Time spent on bludgeoning a GPU into executing algorithm XYZ is time wasted.
#10 Members - Reputation: 205
Posted 04 August 2005 - 05:10 PM
Architecturally, a GPU operates on the opposite extreme of most general purpose CPU, that is if memory surves me correctly. Where standard CPUs are constantly swapping operators, the GPU is constantly swapping data (streaming). So, programming for the two require the programmers to think differently.
Currently, it seems that mainly algorithms that can be massively parallelized will get benefits of GPU speed up, since your data is actually loaded in the form of a texture and them modified by a fragment program, which runs parallel for all pixels in the texture. So, I guess A* may not be the best candidate out there, but who knows, with some twisting of things, someone might be able to think differently and solve it in some strange convoluted way.
I guess the main reason I started this thread was to see how people thought about the possibility, whether its there or not, and just gauge reaction. As for anything concrete, I really don't have anything solid myself, which is something I'll probably work on in the future.
#11 Senior Moderators - Reputation: 1744
Posted 04 August 2005 - 06:08 PM
Quote:
Original post by WeirdoFu
Well, in many aspects, if memory doesn't fail me, some modern GPUs have higher transitor counts than many CPUs currently on the market, but I guess the main issue in trying to "bludgeon" a GPU into doing what you want has to do with the fact that you need to first "bludgeon" programmers into thinking in a different way.
Not really. Perhaps game programmers, or PC application developers in general, but they're hardly the totality of computer science. People have been making algorithms specifically for SIMD architectures for decades. Simply parallelizing an algorithm, however, is not the end of the battle. You still need to deal with the fact that the GPU expects to work on 4-element floating point vectors; that it wants to perform triangle rasterization; that it expects to only perform random access in the form of "textures"; that it's unable to perform any cross-core communication. None of these limitations are inherent to SIMD architectures, and none of them needs to be present in a consumer-level SIMD chip.
If you're interested in path planning on parallel architectures in general, I think that's a very noble goal. A quick Google Scholar search turns up a few links; it seems that most researchers have approached the problem using the potential field approach, which intuitively is more parallelizable than A*.
Quote:
I guess the main reason I started this thread was to see how people thought about the possibility, whether its there or not, and just gauge reaction. As for anything concrete, I really don't have anything solid myself, which is something I'll probably work on in the future.
My reaction: I just got back from the SIGGRAPH conference. The last couple of years, there was an imperial buttload of papers and sketches and posters and courses on GPGPU. This year, there were noticeably fewer. Personally, I think researchers are thinking about the cell processor, and about how much they hate having to think about pixel shaders in order to write database software. Mark my words: within five years, we'll see multi-core processing on Intel and AMD processors. Get ready for that.
#13 Members - Reputation: 548
Posted 05 August 2005 - 03:12 AM
In the game we're currently working on, its often the GPU who's falling back... so having it doing more computations would just be silly.
With next gen consoles, I really don't see why we would want to compute things on the GPU, and as for PC, there's too many different hardware to really rely on it.
Eric
#14 Members - Reputation: 205
Posted 05 August 2005 - 06:36 AM
If we can build a complete particle system engine that is GPU based, then why not start from there and say let's improve on the particle and make "smart" particles? So, let's say we use the particle system in a slightly different way. Let's say we have a game where nanotech is used and the players can use a system called "smart lighting dust." So, basically, you can then create a particle system of light sources that swarmed around the level or around the player highlighting certain things or increasing visibility in general.
The original starting point of GPGPU was not just trying to get the GPU to do things other than graphics, but rather, can we piggy-back things onto our graphics that benefit the overall cause. So, instead of thinking that we want to translate AI stuff onto a GPU, why not think whether we can piggy-back AI stuff into graphics itself. So, maybe "smart graphics" or "smart textures" even.
#16 Members - Reputation: 798
Posted 05 August 2005 - 07:38 AM
Quote:
Original post by Sneftel
You still need to deal with the fact that the GPU expects to work on 4-element floating point vectors; that it wants to perform triangle rasterization; that it expects to only perform random access in the form of "textures"; that it's unable to perform any cross-core communication. None of these limitations are inherent to SIMD architectures, and none of them needs to be present in a consumer-level SIMD chip.
This is merely an argument to use Sh or Brook to abstract that layer though. Both of these can target GPUs and provide a generalized stream processing interface. I also wouldn't be at all surprised if both of these can target Cell, multi-core, etc. in the future as well.
Quote:
Original post by Sneftel
Mark my words: within five years, we'll see multi-core processing on Intel and AMD processors. Get ready for that.
... both companies already have multi-core processors on the market. I don't think there has been any doubt of this evolution for the past few years at least. Still, even multi-core processors are not designed as efficient stream processing engines (like the Cell's SPEs). We do need generalized stream processing/DSP-like things in modern computers, that's for sure. Whether that will end up being provided by some evolution of GPUs, or built into the hardware is really immaterial. Either GPUs will become more general and not be just "GPUs" any more, or we'll convert to something like the Cell architecture and probably won't need GPUs any more.
In either case, GPUs are here now, they're cheap and fast stream processors, and thus I see no reason not to continue research using them... If a large chunk of the research was about finding ways around GPU limitations that would not exist in generalized DSPs, then maybe there'd be something to complain about (research probably unless in the next five years). However I don't think that this is the case at all: the problems that we're hitting now with GPUs are general stream processing and parallel algorithm problems for the most part. Sh and Brook already solve 99% of the interface problems.
#17 Senior Moderators - Reputation: 1744
Posted 05 August 2005 - 06:51 PM
Quote:
Original post by AndyTX Quote:
Original post by Sneftel
You still need to deal with the fact that the GPU expects to work on 4-element floating point vectors; that it wants to perform triangle rasterization; that it expects to only perform random access in the form of "textures"; that it's unable to perform any cross-core communication. None of these limitations are inherent to SIMD architectures, and none of them needs to be present in a consumer-level SIMD chip.
This is merely an argument to use Sh or Brook to abstract that layer though. Both of these can target GPUs and provide a generalized stream processing interface. I also wouldn't be at all surprised if both of these can target Cell, multi-core, etc. in the future as well.
Brook and Sh handle the annoying programming interface issues, but not the underlying hardware design. They can't add in features the card doesn't have, such as inter-fragment communication. If you want to do something parallelizable and future-proof, you'd be better off with MPI (though of course these are less for stream processing).
Quote:
Original post by Sneftel
Mark my words: within five years, we'll see multi-core processing on Intel and AMD processors. Get ready for that.
... both companies already have multi-core processors on the market. I don't think there has been any doubt of this evolution for the past few years at least.[/quote]
I think I was unclear there... when I say "multicore", I'm not talking about hyperthreading or the AMD X2. I'm talking about the sort of stream processing that GPUs and the Cell processor can do. Still, I think we're mostly on the same page here.
#18 Anonymous Poster_Anonymous Poster_* Guests - Reputation:
Posted 06 August 2005 - 12:40 AM
Quote:
Original post by Sneftel
A* is not an algorithm that lends itself to parallelization, so it's unlikely that a GPU-based solution would be at all performant.
IMHO, the "Stupid GPU Tricks" fad is one that is fading, fast. The bottom line is that a graphics card is designed for graphics. It's more programmable than it used to be, but the transistor layout is still one that's optimized for the rasterization pipeline and not for other things. The need for generalized parallel processing hardware on consumer-level computers is clear, but the GPU is not the chip that will fill this niche. Time spent on bludgeoning a GPU into executing algorithm XYZ is time wasted.
Maybe doing multiple A* in parallel ??? Some of the cards have 60000 instruction shader scripting (Im not sure if the instruction set is adaquate tho... and the PCIX speeds up constant updates of 'map' texture data in graphics memory (so much that there was some talk of going back to 'system' memory ....).
Influence maps a good possibility ??
#19 Members - Reputation: 136
Posted 06 August 2005 - 09:29 AM
Also found another one on their first page but havent tried that one yet.
#20 Members - Reputation: 798
Posted 08 August 2005 - 04:19 AM
Quote:
Original post by Sneftel
Brook and Sh handle the annoying programming interface issues, but not the underlying hardware design. They can't add in features the card doesn't have, such as inter-fragment communication.
Well excepting perhaps "scatter" (which can still be done on the GPU with various trickery, although not altogether efficiently), all of the standard stream processing operations CAN be done on the GPU. I'm not sure what you're referencing with inter-fragment communication since to my knowledge NOT allowing that is a fundimental property that differentiates standard CPUs and DSPs (perhaps THE most important property). If you're speaking of things like "reductions", those CAN be implemented efficiently on a GPU (and are in Brook IIRC and soon Sh).
Quote:
Original post by Sneftel
I think I was unclear there... when I say "multicore", I'm not talking about hyperthreading or the AMD X2. I'm talking about the sort of stream processing that GPUs and the Cell processor can do. Still, I think we're mostly on the same page here.
I won't disagree with you here. Like I said, either the GPU will morph into a general stream processor (and then optionally be integrated into the motherboard - that detail is really insignificant to our development efforts), or something more general will come along and be able to take over the role that the GPU is playing right now.
In any case I think we all agree that stream/parallel processing is becoming more and more important. Right now, I'd still argue though that GPUs are relatively cheap, powerful and readily available stream processors that - with the aid of Sh and/or Brook - are easily used as such.






