Projecting the future

Started by
18 comments, last by jvff 18 years, 3 months ago
Quote:Original post by ajas95
Now we just need some really good programmers!


Famous last words.

Non-static geometry would be cool though. Like in the old all-CPU games. But then again, DX10 has been specifically designed for that, so we'll be getting it shortly (when Vista ships HAH).
Working on a fully self-funded project
Advertisement
Quote:Famous last words


Yeah, I hear you.

It does have a few things going for it. To PS2 engine programmers, it should look pretty familiar. To PC programmers, it's probably like a bad dream :) But IBM is heavily invested in Cell, and their big payback isn't the PS3 but in supercomputers, servers, render farms, research programs... and they have a big interest in teaching the programming public as much as possible.

IBM has a bunch of very informative articles about cell, along with a full system simulator, so that people who don't have the processor yet can start writing code.

Quote:Non-static geometry would be cool though. Like in the old all-CPU games. But then again, DX10 has been specifically designed for that


I'm not sure what support DX10 has, but I'm sure it's still a one-way street. With Cell, instead of the huge bulk of processing power being used to generate an image, it can be used to generate the game. Besides rigid-body physics, you could use it to do more proximity queries, line-of-sight tests, real-time terrain erosion, cloth, water, swaying trees, particles that collide and iteract, better pathfinding, animating raytraced sky textures, simply put-- anything. The computation feeds back into the game.

And each of those individual things may seem trivial, but combine them... proximity queries, LoS, and pathfinding, and suddenly you have a way smarter AI that has the cycles to evaluate the terrain, run back and duck behind a tree for cover. Or twenty of them in a team. Communicating. Each aware of everything around. Believe it or not, but most AI cycles are spent just in evaluating what's around rather than figuring out what to do next!

And, (my original point) the only way to get all this is with lots of super-fast memory and huge bandwidth. Just adding more cores or increasing Ghz is like upping the speed limit on a road that has a stop sign at every block.

[Edited by - ajas95 on January 2, 2006 3:30:03 AM]
Quote:
And, (my original point) the only way to get all this is with lots of super-fast memory and huge bandwidth. Just adding more cores or increasing Ghz is like upping the speed limit on a road that has a stop sign at every block.


Your point is the origin of my thoughts that originated my discussion. That point is true and led me to learn x86 assembly language so I could optimize to the bone my code and use every "prefetch", "dw" and SIMD instruction available to max cache coherency and code speed. I studied Pentiums' and Athlons' pipelines to exploit everything I could and the final conclusion was: I could write code separate for every architecture available to make it optimized to a level so high and it still will not be enough.

Maybe it's just what you have to do. Optimize the juice out. But what architecture will you target if your project will only be out in ten years? Learn Cell SPE assembly? x86-64 assembly? POWER assembly? UltraSparc T1 assembly? No matter how hard you squeeze it, you need to change the squeeze technique for every different fruit (haha lame :-| ).

That's why I look forward for something abstract and high-level (Java, C#, whatever) that also allows for future low level optimization. (Read: prototype high-level, then redesign efficiency functions in assembler specific for an architecture). The future is unpredictable, but it can be projected, designed. Thats why I am trying to understand where it is heading and what path it's taking.

Thanks for the answers and opinions,

JVFF
ThanQ, JVFF (Janito Vaqueiro Ferreira Filho)
DX10 provides pixel shaders (for texturing polys in fancy ways), vertex shaders (to modify vertexes in fancy ways) and geometry shaders (to create polys in fancy ways).

Yes, I agree that languages are constantly changing (*cough* if you're looking at DX instead of GL *cough*) and planning to write code that's going to be executed in the next 10 years is probably a bad idea. Even if you design an architecture and algorithms without coding them in a compilable language, they will probably perform horribly in that time's hardware.

Example: I've played Colin Mc Rae 2 in my Radeon 9600XT. That card has hardware 3D and is DX9 compliant. Yet, you can't put CMR2 in full detail. Why? because back then there weren't any vertex buffers, so with high draw distance the data still overwhelms the bus. If it had the vertexes stored in the proper hardware buffers, it would go lightning fast, but back then there wasn't a way to know it would work that way. Still, the card and the computer are 10x faster than the intended... but the detail doesn't scale accordingly (barely 3x).

For another example look at the tech behind Shiny's Messiah. (the promise there wasn't speed, but detail)

¿What does that mean? it means that if optimized x86 code wasn't enough.... the Cell probably won't be enough either.

Good programming is to do magic with what's avaiable. My advice is just roll with what there is, and update whenever the hardware changes. Don't count on infinite speed.
Working on a fully self-funded project
Quote:Original post by Madster
¿What does that mean? it means that if optimized x86 code wasn't enough.... the Cell probably won't be enough either.

Good programming is to do magic with what's avaiable. My advice is just roll with what there is, and update whenever the hardware changes. Don't count on infinite speed.


Hello,

Thanks a lot! That was the answer I was looking for. So I should use all available today in an optimal way and just update as time passes. Do you think it is a good idea to code in high-level language to implement some sort of prototype to fix design flaws and restart the project in the final language some time in the future? Thanks again

JVFF
ThanQ, JVFF (Janito Vaqueiro Ferreira Filho)
I think it's important to:

- Use what tools are currently available to minimize the amount of wheel-reinvention.

- Keep in mind that there are loads of developers hired specifically to optimize the APIs that we use. To think that you, a hobbyist programmer, need to spend more time optimizing your applications at their very basal levels in order to squeeze every last ounce of performance out of your applications, is just conceit.

- Concentrate on the larger picture of game development. For a project in entirety, there are many aspects to consider; pipeline-profiling the graphics and memory-access engines will only serve to delay the actual completion date of your game. It is more important to adopt strong design concepts, and adhere to these while your game is being developed.

- (For the sake of easing future worries) stick to higher-level languages than ASM. You're almost guaranteed that the C/C++ languages of today will be identical to the C/C++ language used in 5-10 years' time to code on Cell processors.

I inherently knew when I read your first post that you were an asm programmer; I have had this discussion numerous times with different asm programmer buddies :)

I should note that I am not against optimization; however given that you are a hobbyist programmer, you are likely not part of a large team of artists, sound engineers, actors, and level designers. Hence, your Team of One can't afford to dedicate all its resources to fortune-telling :)
Yeah, guilty of charges :) . I did enjoy wandering in the asm world of optimization :D . Actually I enjoyed a bit too much and it became sort of a programming style. So everytime I code in C/C++ my code gets messed up in under a day by the amount of code optimization with pointers, references, paramter passing, variable use and lots of other things. That's why now I'm coding in higher level languages (Ruby, Java, C#) where the language has so many features to put the code down cleanly I code faster and there's less room (code) for optimization. And these languages are mostly slow by nature so there is no reason I optimize it.

So now I'm following rule number one of optimization: Get it working first. So after I've fixed design flaws and have a good concept I can totally rewrite the code in another language with a more mature concept in my head. That way when I'm finished (or if I ever finish) I'll get more productive on the new asm architectures available then. My only question is: is it worth it?

Thanks,

JVFF
ThanQ, JVFF (Janito Vaqueiro Ferreira Filho)
Well, two things.

First, there are two categories of tasks-- things that are processing bottlenecks and therefore are worth spending time optimizing, and those tasks that aren't processing bottlenecks and aren't worth optimizing in the least. Most tasks aren't worth optimizing, BUT some are. In fact, in the days I was doing demos I found it was best to focus on a single hard problem... optimize the crap out of it and make it look better than even commercial games, and make that the focal point of the demo ignoring everything else.

Second, in a multi-processing environment, there is another consideration called a "dependency chain" (this might be familiar if you're into asm). Where if task C depends on task B, which depends on task A, and they can all run concurrently with task D.... Even though A, B and C individually take less time than D, their combination can be the bottleneck; therefore optimizing D before A,B,C is pointless. This can be very complicated and more importantly, this "dependency chain" concept won't go away in 10 years or 100 years, and even given a million processors the algorithm will never run faster than the combined duration of A, B, and C on a single one.

Only one thing is certain for the next 10 years: Processors won't get faster, there will simply be more of them. So if you want to take a future-proofing approach to design, your best bet is study concurrency within a frame. Figure out which tasks can execute independently and which tasks form dependency chains... the goal is to get many tasks executing at the same time, shortening the longest dependency chain.

Here's an example. Say you want to do stencil shadows. Well, the dependencies looks something like-- first game logic has to run so all characters know which animations to play, then all skeletons are computed from the animation data, collision detection and resolution execute and IK dynamics are applied so the final skeleton is known, the shadow mesh has to be skinned and only from THAT can the silhouette be computed and the stencil shadows rendered! Your goal (10 years from now) isn't to make each of those things run faster, it's to figure some way to make most of them run at the same time. Much harder stuff, your brain is important here, programming language is irrelevant.



The CELL processor only performs really well on pipelined SIMD data problems where it can make use of the capabilities of the SPEs (the hyped 'theoretical 256GigaFlops). There are more than a few game problem spaces that can make use of this, but there are many that cannot.

Alot of AI processing is not pipeline oriented and faces the "dependency chain" problem mentioned above (waiting for semaphores...). As I mentioned before, real AI in games will have requirements for CPU that will DWARF the physics/graphics needs. Irregular access to a large data space (both world data AND script code) requires the SPEs to constantly access the main memory data bus which causes constant delays (each SPE only has 256KB data/instruction store, so has limited use for batched local data processing jobs). Irregular processing suddenly causes the CELL to perform at a tiny fraction of the special cases.

A non SIMD oriented design might allowed 50% more processors in the same space.
OR
A larger data store per CPU might significantly increase the versatility for more irregular processing ( but how much is ever enough??).

Intel is also working on multiple CPU on die (supposedly 16 CPU versions are already being tested). I havent seen the spec yet for that design, but it probably wont be any 'Solves All' solution either.


So the idea is to get as many concurrent threads running as possible. Seems logical. But should the project be designed based on multi-threading or using multi-threading? (ie. in first case all variables are thread safe, every function separates tasks, no dependancy-chains, etc.; on second case most intensive and parallelable code are threaded.) Thanks for the reply,

JVFF
ThanQ, JVFF (Janito Vaqueiro Ferreira Filho)

This topic is closed to new replies.

Advertisement