• Advertisement
Sign in to follow this  

Projecting the future

This topic is 4397 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

OBS: Mainly speculation, opinions, and doubts. More a rant asking for feedback and criticism. Hello, I would like to call some opinions about some thoughts I had lately. Since I started hobby programming, I have always had an ambition to grow, exceed expectations. Okay, maybe it was just some child thoughts about getting to new ground thoughts, but now that those days are (mostly) over, I have been thinking about what am I going to do. My aim is something in ten years (that's why not all of my ambition has gone ;-) ) so I have been brainstorming for the path to go to follow time (read, keep up when the future is here). Some posts earlier (or some years earlier) I posted a threas asking if ray tracing was the future. After some debate the most obvious answer was probably yes (some said definitly, others said eventully rasterization could fake everything out and as time passes we've seen that the shader revolution has made rasterization more flexible but it is still rasterization...). The most used and believable argument was ray tracing's speed that was less dependant on the number of objects on the scene, different from rasterization. So on the last year I've done some raytracing research and I think it does have some potential. From there my focus has changed to what is the future to what will take me to the future. I have studied some approaches like realstorm's agressive optimizations and general purpose computing in GPUs and this is where doubts start to appear. I really liked optimizating code in assembly level, but in the end of the day all I saw was the same, OOO execution, highly pipelined, high speed single thread optimized processors. It is definatly not the processor for ray tracing nor graphics applications. In the following years there will be a rise in a multi-core processors. We all know that. AMD and Intel have already released dual-core processors. That is definatly good, but on the other side, IBM, Sony and Toshiba also recently launched Cell BE which is the new FPU-intensive throughput (ray tracing friendly) processor. Also there is some sepculation about Cell "dominating" PCs, which I doubt but it shows there is an alternative future for processors. The throughput computer of the future. Will this be the game machine in the next 10 years? (I'm not talking about PS3, please understand I am referring to games the average Joe with a computer he uses for workstation and also plays games on). So I don't know if I should stay with x86 (or x86-64) and follow the multicore growth or go to the future with DSPs and Cell-like architectures. This is if I would choose the path of optimizations, which would lead to more efficient game engines and with a performance that the processor would easly catch up. The second path is to go with something that is more safe and that follows the game industry. This is about the average Joe's GPU card and (if it works out) PPU card. My problem with this is that while there already is General Purpose Computing on this side of technology it is still a private, closed industry. I mean, crap, shaders are just abstract code that the GPU translates to some other like assembly like system dependant instructions. And who has access to those instructions? How could I study the architecture of a GPU without nVidia or ATi releasing any serious data-sheets. PLEASE correct me if I'm wrong but there is no datasheets for anyone outside the manufacturer's partners. The third path is the risky one. I could program for reconfigurable computers which could be the future of co-processors or something like ClearSpeed's 97 FPU coprocessor. Or I could even go to the next step of programming and DESIGN a game engine for an FPGA. Some people speculate eventually reconfigurable computers will be somewhere in the future. I have to imagine that it has some solid arguments. One example is: Why a CPU, a GPU and a PPU on one SINGLE machine when you could hava a chip that reconfigures itself depending on what you are running. That way each game could configure it's own GPU, PPU, AI-PU, etc. Forget optimizations... Go the reconf way :-) The future seems wild. I seem crazy. This crazy post speculating the future and seeking a path to it is just to open minds about how wierd the information world is and how uncertain the future of programming seems... Or just to entertain as another Sci-Fi "see the future" short story... Anyway, the last path is the safe one. Code everything in Java or C# aiming for coding efficiency, safety, short coding time, fast "results" and clean code hoping in 10 years the Java VM or C# compiler does it's job well and handles the code in realtime. Well. That's it. Sorry about the long post. I hope I could call some minds and opinions about this. Call me stupid or crazy but I thought it would be fun to join my game of "lets see the future". If you made it this far I can't thank you enough :-). Consider this my .02 cents (sorry if I am using this expression incorrectly) and please post feedback :D, JVFF PS. Pleeze xkuse bad English. ;-)

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by jvff
The second path is to go with something that is more safe and that follows the game industry. This is about the average Joe's GPU card and (if it works out) PPU card. My problem with this is that while there already is General Purpose Computing on this side of technology it is still a private, closed industry.


If you have any interest in finding employment in the present, this is probably the path to follow. [wink]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster

"That way each game could configure it's own GPU, PPU, AI-PU, etc"


We already have that in the form of the generic General Purpose CPU.
Unfortunatley see the difference between graphics speed run on a CPU versus even a lame (specialized) GPU to see the difference. Reconfigurable logic on the scale to build one of todays GPU/CPU/PPU/WPU (a whatever processing unit) is decades away.

Possibly sooner they will have architectures that allow the user to build up a machine to meet their needs by adding specialized processors much like we add SIMMs (hmm, how long before a current high end GPU would fit on a SIMM sized daughter board without requiring liquid nitrogen cooling ??)

Either way programming will have to be done in a multiprocessor orientation. I dont think that hardware will be able to automaticly do all the data protection interlocks so it will still be the job for the programmer.

Share this post


Link to post
Share on other sites
Neh. Forget employment. At least for the next 10 years. Just hobby. Or as I like to think about it. Just indie planning and thinking :D

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster

"That way each game could configure it's own GPU, PPU, AI-PU, etc"


We already have that in the form of the generic General Purpose CPU.
Unfortunatley see the difference between graphics speed run on a CPU versus even a lame (specialized) GPU to see the difference. Reconfigurable logic on the scale to build one of todays GPU/CPU/PPU/WPU (a whatever processing unit) is decades away.

Possibly sooner they will have architectures that allow the user to build up a machine to meet their needs by adding specialized processors much like we add SIMMs (hmm, how long before a current high end GPU would fit on a SIMM sized daughter board without requiring liquid nitrogen cooling ??)

Either way programming will have to be done in a multiprocessor orientation. I dont think that hardware will be able to automaticly do all the data protection interlocks so it will still be the job for the programmer.


Agreed. But remember that this is speculating the future so we have to do some wild guesses and, well, dream a little. I have been hobby researching reconfigurable computers on the past weeks (even designed one to one day put on an fpga) and they really do offer future. Reconfigurable logic isn't a physical chip that has it's own TSMC built in so it can rebuild itself. It is more a virtual processor layer. There are many types of reconf. chips. One of them is a mesh-like architechure of ALUs (can't remember website) so each ALU processes 8bits and routes to a near ALU so you code the path of the data. Another type is multiple mini-cores in a chip. It's just a multi-core processor with the exception you context switch the data and instructions between the mini-cores to reconfigure and route everything. And, the most used type is the FPGA. The downsides are the configuration time (ie. in most FPGAs you have to erase the entire chip and it takes severall ms to reconfigure it) and the area size. The LUTs and CLBs occupy a big array to process a single-bit (fine-grained) and is a waste of space for operation on multi-bit data (mainly because FPGA's aren't designed to be reconfigurable computers but to implement real chips vitually). There are lots of research on the area. Including some comercial examples from Cray, Xilinx (in the form of FPGA's with built-in hardware cores for controlling the FPGA), some other companies and some universities. Anyway, it just might be the future of coprocessors (as it was processed in 1960s) or just prototypes for chips :) . Thank you for your answers,

JVFF

Share this post


Link to post
Share on other sites
I think the last path is the most promising. Hone your skills at creating a framework that allows rapid prototyping and development of games.

Sure, there's all kinds of neat technologies that come into existence over the years, but look at games. What has really changed over the years? Not much. If you look at all of the games over the different years, there's a handful of genres with the same old gameplay every time. What usually changes over time is that the graphics get flashier, "fancy" gimmicks like ragdoll get thrown in, and so on. But the games are about the same. Sadly, I've felt like a lot of the actual gameplay has declined over the years as more focus is put into the flashy graphics than how the game actually plays.

My prediction is that at some point in the future people will realize that you don't NEED to squeeze out every last drop of performace on a system to make a great game. Think about it. If you have a game that draws 2 million triangles per frame and another game that draws 1.6 million triangles per frame, do you think anyone will actually notice? Probably not. And more importantly, the people who strive the hardest to eek out that extra performance usually end up putting awkward restrictions on their art and design staff.

Sure, you can always use more processing power. You can start to model true 3D volumetric effects, fluids, and all kinds of stuff that aren't easily done today. But at the end of the day, you're making a game. It should be fun. So, I'd focus my efforts on creating a framework that makes it easy to create and iterate through design changes on your game. To me, that is what will make or break games in the future.

-John

Share this post


Link to post
Share on other sites
Guest Anonymous Poster


Ive been looking at programmable logic arrays for 25+ years.

Even with high level functional blocks to help, a complex web of logic interlinks between blocks are still required for something as complex as a GPU pipeline.

Try mapping out what is required even for a single GPU pipeline if your building blocks are only 8 bit ALUs. Suddenly you find the blocks are better if they are 32+ bit for floating point operations and and for this 'configurability' trait you suddenly find that significant sets of your various blocks are unused or too much of the chip is taken up by mostly unused communications buss channels.

Once you start going to these large blocks it becomes more and more like a network of general purpose CPUs (soon you add local memory to the register stores and then you share data channels to increase the utilization/versatility).

Multi core except now there might be hundreds of them. Intel already has plans for 16 cores on one CPU (maybe multiple sub dies to increase yield) in the near future

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Teknofreek
I think the last path is the most promising. Hone your skills at creating a framework that allows rapid prototyping and development of games.

Sure, there's all kinds of neat technologies that come into existence over the years, but look at games. What has really changed over the years? Not much. If you look at all of the games over the different years, there's a handful of genres with the same old gameplay every time. What usually changes over time is that the graphics get flashier, "fancy" gimmicks like ragdoll get thrown in, and so on. But the games are about the same. Sadly, I've felt like a lot of the actual gameplay has declined over the years as more focus is put into the flashy graphics than how the game actually plays.

My prediction is that at some point in the future people will realize that you don't NEED to squeeze out every last drop of performace on a system to make a great game. Think about it. If you have a game that draws 2 million triangles per frame and another game that draws 1.6 million triangles per frame, do you think anyone will actually notice? Probably not. And more importantly, the people who strive the hardest to eek out that extra performance usually end up putting awkward restrictions on their art and design staff.

Sure, you can always use more processing power. You can start to model true 3D volumetric effects, fluids, and all kinds of stuff that aren't easily done today. But at the end of the day, you're making a game. It should be fun. So, I'd focus my efforts on creating a framework that makes it easy to create and iterate through design changes on your game. To me, that is what will make or break games in the future.

-John



The flashyness may be reaching the point of decreasing returns, but look into whats (hopefully) the next aspect that may bring major game improvements -- AI.

AI can/will use huge amounts of memory and even more CPU resources, which will make whats used for graphics now look like a drop in the bucket. Even general physics for the terrain is a magnitude more CPU than whats currently available.

Most games these days are pretty looking, but are deserts. Very little interaction or 'smart' objects are present. NPCs are lifeless mannekins and monsters might as well run on little tracks.

We will need vast improvements in CPU power for quite a while yet.

Share this post


Link to post
Share on other sites
Yeah. The last path is not only safe but flexible. If you think about it, architectures have come and gone, assembly language has changed a lot, and x86 gained a lot of extensions, but in the end C rules programming. Since it's origin in Unix until today, C and it's derivatives still prevails in most programs and frameworks. So if your coding Sun's UltraSparc T1, or IBM's Cell BE or just good old x86, you can use the same C code.

About the reconfigurable computers, indeed they are no substitute for ASIC design, but I am impressed but the opportunity it offers to the end user. In some years die shrinks will not sustain the transistor growth necessary for the current path consumer electronics is taking to put everything on one big multifunctional chip. Cell phones soon will be what notebooks are today. So I think it's more worth it to have "virtual chips" that are hot-swappable that allows easier management of the chips resources. I aggree that the 8bit ALU example is true, you just can't make a reliable GPU with it. The point isn't to make reconfigurable machines giant multi-core MIMD machines but something that allows it to be something at one time and another thing at another time. The difference from normal General Purpose Processors is that you can arrange the way instruction flow changes data flow. Some things can run at low speed while it can spawn multiple threads to do the same thing (parallel) others just need a single fast thread (pipelined).

This is getting interesting. Meeting people, gaining knowledge :D. Thanks everyone,

JVFF

Share this post


Link to post
Share on other sites
Cell's design is the future. I think one important point rarely mentioned is that in real-life algorithms, memory access is almost always the bottleneck. I think many programmers have this mental model of a computer where you can just access any piece of memory, and it's ready in one cycle. This is just wildly false, and becoming more and more false.

Here's an example about how pointless more CPU power is. In most PS2 games, normal game routines run at about 20-25% CPU efficiency, meaning they spend 75-80% of cycles stalling mainly on memory access, either I-cache or D-cache misses. PS2 has a 300Mhz proc and about a 40 cycle cache-miss penalty. PS3 has a 3.2Ghz proc (10x faster) and around a 450 cycle cache-miss penalty (10x worse), so the actual time penalty is the same. Given that the memory penalty hasn't changed, the routine should expect only a 25% increase speed with a 1000% increase in CPU power! And the caches are still 32k each, and the presence of an L2 will give a slight improvement, maybe another 10%, but still not much considering the extra clock speed.

BUT, that just applies to PCs and the PPU. The really beautiful thing about Cell is in the SPUs and the EIB. First memory bandwidth is 25.6 GB/s, which is really unheard of... way, way faster than PCs. So, even though latency is 100's of cycles, there is still enough bus to feed each SPU at about 1 byte per cycle. Second, each SPU has 256kb of super-fast memory. So, in total, there is about 1.8 Megs of L1-speed memory! Additionally, each one has 128 128-bit registers, giving the machine 16 kb total of registers! Incredible.

Bottom line is that Cell actually provides the memory bandwidth needed to approach its theoretical processing limits in real-life problems, so there are all kinds of new possibilities now. The old rule was that you make geometry and textures static so they only need to be uploaded to the GPU once. But now you will see all kinds of dynamic interactions in game elements. Shaders have carried the gfx torch for the past 4 years, but in the end it's all just eye-candy since none of those calculations can be retrieved efficiently and used by game logic. Now there is a huge amount of processing power that CAN be used by game logic, and that is super cool.

Now we just need some really good programmers!

Share this post


Link to post
Share on other sites
Quote:
Original post by ajas95
Now we just need some really good programmers!


Famous last words.

Non-static geometry would be cool though. Like in the old all-CPU games. But then again, DX10 has been specifically designed for that, so we'll be getting it shortly (when Vista ships HAH).

Share this post


Link to post
Share on other sites
Quote:
Famous last words


Yeah, I hear you.

It does have a few things going for it. To PS2 engine programmers, it should look pretty familiar. To PC programmers, it's probably like a bad dream :) But IBM is heavily invested in Cell, and their big payback isn't the PS3 but in supercomputers, servers, render farms, research programs... and they have a big interest in teaching the programming public as much as possible.

IBM has a bunch of very informative articles about cell, along with a full system simulator, so that people who don't have the processor yet can start writing code.

Quote:
Non-static geometry would be cool though. Like in the old all-CPU games. But then again, DX10 has been specifically designed for that


I'm not sure what support DX10 has, but I'm sure it's still a one-way street. With Cell, instead of the huge bulk of processing power being used to generate an image, it can be used to generate the game. Besides rigid-body physics, you could use it to do more proximity queries, line-of-sight tests, real-time terrain erosion, cloth, water, swaying trees, particles that collide and iteract, better pathfinding, animating raytraced sky textures, simply put-- anything. The computation feeds back into the game.

And each of those individual things may seem trivial, but combine them... proximity queries, LoS, and pathfinding, and suddenly you have a way smarter AI that has the cycles to evaluate the terrain, run back and duck behind a tree for cover. Or twenty of them in a team. Communicating. Each aware of everything around. Believe it or not, but most AI cycles are spent just in evaluating what's around rather than figuring out what to do next!

And, (my original point) the only way to get all this is with lots of super-fast memory and huge bandwidth. Just adding more cores or increasing Ghz is like upping the speed limit on a road that has a stop sign at every block.

[Edited by - ajas95 on January 2, 2006 3:30:03 AM]

Share this post


Link to post
Share on other sites
Quote:

And, (my original point) the only way to get all this is with lots of super-fast memory and huge bandwidth. Just adding more cores or increasing Ghz is like upping the speed limit on a road that has a stop sign at every block.


Your point is the origin of my thoughts that originated my discussion. That point is true and led me to learn x86 assembly language so I could optimize to the bone my code and use every "prefetch", "dw" and SIMD instruction available to max cache coherency and code speed. I studied Pentiums' and Athlons' pipelines to exploit everything I could and the final conclusion was: I could write code separate for every architecture available to make it optimized to a level so high and it still will not be enough.

Maybe it's just what you have to do. Optimize the juice out. But what architecture will you target if your project will only be out in ten years? Learn Cell SPE assembly? x86-64 assembly? POWER assembly? UltraSparc T1 assembly? No matter how hard you squeeze it, you need to change the squeeze technique for every different fruit (haha lame :-| ).

That's why I look forward for something abstract and high-level (Java, C#, whatever) that also allows for future low level optimization. (Read: prototype high-level, then redesign efficiency functions in assembler specific for an architecture). The future is unpredictable, but it can be projected, designed. Thats why I am trying to understand where it is heading and what path it's taking.

Thanks for the answers and opinions,

JVFF

Share this post


Link to post
Share on other sites
DX10 provides pixel shaders (for texturing polys in fancy ways), vertex shaders (to modify vertexes in fancy ways) and geometry shaders (to create polys in fancy ways).

Yes, I agree that languages are constantly changing (*cough* if you're looking at DX instead of GL *cough*) and planning to write code that's going to be executed in the next 10 years is probably a bad idea. Even if you design an architecture and algorithms without coding them in a compilable language, they will probably perform horribly in that time's hardware.

Example: I've played Colin Mc Rae 2 in my Radeon 9600XT. That card has hardware 3D and is DX9 compliant. Yet, you can't put CMR2 in full detail. Why? because back then there weren't any vertex buffers, so with high draw distance the data still overwhelms the bus. If it had the vertexes stored in the proper hardware buffers, it would go lightning fast, but back then there wasn't a way to know it would work that way. Still, the card and the computer are 10x faster than the intended... but the detail doesn't scale accordingly (barely 3x).

For another example look at the tech behind Shiny's Messiah. (the promise there wasn't speed, but detail)

¿What does that mean? it means that if optimized x86 code wasn't enough.... the Cell probably won't be enough either.

Good programming is to do magic with what's avaiable. My advice is just roll with what there is, and update whenever the hardware changes. Don't count on infinite speed.

Share this post


Link to post
Share on other sites
Quote:
Original post by Madster
¿What does that mean? it means that if optimized x86 code wasn't enough.... the Cell probably won't be enough either.

Good programming is to do magic with what's avaiable. My advice is just roll with what there is, and update whenever the hardware changes. Don't count on infinite speed.


Hello,

Thanks a lot! That was the answer I was looking for. So I should use all available today in an optimal way and just update as time passes. Do you think it is a good idea to code in high-level language to implement some sort of prototype to fix design flaws and restart the project in the final language some time in the future? Thanks again

JVFF

Share this post


Link to post
Share on other sites
I think it's important to:

- Use what tools are currently available to minimize the amount of wheel-reinvention.

- Keep in mind that there are loads of developers hired specifically to optimize the APIs that we use. To think that you, a hobbyist programmer, need to spend more time optimizing your applications at their very basal levels in order to squeeze every last ounce of performance out of your applications, is just conceit.

- Concentrate on the larger picture of game development. For a project in entirety, there are many aspects to consider; pipeline-profiling the graphics and memory-access engines will only serve to delay the actual completion date of your game. It is more important to adopt strong design concepts, and adhere to these while your game is being developed.

- (For the sake of easing future worries) stick to higher-level languages than ASM. You're almost guaranteed that the C/C++ languages of today will be identical to the C/C++ language used in 5-10 years' time to code on Cell processors.

I inherently knew when I read your first post that you were an asm programmer; I have had this discussion numerous times with different asm programmer buddies :)

I should note that I am not against optimization; however given that you are a hobbyist programmer, you are likely not part of a large team of artists, sound engineers, actors, and level designers. Hence, your Team of One can't afford to dedicate all its resources to fortune-telling :)

Share this post


Link to post
Share on other sites
Yeah, guilty of charges :) . I did enjoy wandering in the asm world of optimization :D . Actually I enjoyed a bit too much and it became sort of a programming style. So everytime I code in C/C++ my code gets messed up in under a day by the amount of code optimization with pointers, references, paramter passing, variable use and lots of other things. That's why now I'm coding in higher level languages (Ruby, Java, C#) where the language has so many features to put the code down cleanly I code faster and there's less room (code) for optimization. And these languages are mostly slow by nature so there is no reason I optimize it.

So now I'm following rule number one of optimization: Get it working first. So after I've fixed design flaws and have a good concept I can totally rewrite the code in another language with a more mature concept in my head. That way when I'm finished (or if I ever finish) I'll get more productive on the new asm architectures available then. My only question is: is it worth it?

Thanks,

JVFF

Share this post


Link to post
Share on other sites
Well, two things.

First, there are two categories of tasks-- things that are processing bottlenecks and therefore are worth spending time optimizing, and those tasks that aren't processing bottlenecks and aren't worth optimizing in the least. Most tasks aren't worth optimizing, BUT some are. In fact, in the days I was doing demos I found it was best to focus on a single hard problem... optimize the crap out of it and make it look better than even commercial games, and make that the focal point of the demo ignoring everything else.

Second, in a multi-processing environment, there is another consideration called a "dependency chain" (this might be familiar if you're into asm). Where if task C depends on task B, which depends on task A, and they can all run concurrently with task D.... Even though A, B and C individually take less time than D, their combination can be the bottleneck; therefore optimizing D before A,B,C is pointless. This can be very complicated and more importantly, this "dependency chain" concept won't go away in 10 years or 100 years, and even given a million processors the algorithm will never run faster than the combined duration of A, B, and C on a single one.

Only one thing is certain for the next 10 years: Processors won't get faster, there will simply be more of them. So if you want to take a future-proofing approach to design, your best bet is study concurrency within a frame. Figure out which tasks can execute independently and which tasks form dependency chains... the goal is to get many tasks executing at the same time, shortening the longest dependency chain.

Here's an example. Say you want to do stencil shadows. Well, the dependencies looks something like-- first game logic has to run so all characters know which animations to play, then all skeletons are computed from the animation data, collision detection and resolution execute and IK dynamics are applied so the final skeleton is known, the shadow mesh has to be skinned and only from THAT can the silhouette be computed and the stencil shadows rendered! Your goal (10 years from now) isn't to make each of those things run faster, it's to figure some way to make most of them run at the same time. Much harder stuff, your brain is important here, programming language is irrelevant.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster


The CELL processor only performs really well on pipelined SIMD data problems where it can make use of the capabilities of the SPEs (the hyped 'theoretical 256GigaFlops). There are more than a few game problem spaces that can make use of this, but there are many that cannot.

Alot of AI processing is not pipeline oriented and faces the "dependency chain" problem mentioned above (waiting for semaphores...). As I mentioned before, real AI in games will have requirements for CPU that will DWARF the physics/graphics needs. Irregular access to a large data space (both world data AND script code) requires the SPEs to constantly access the main memory data bus which causes constant delays (each SPE only has 256KB data/instruction store, so has limited use for batched local data processing jobs). Irregular processing suddenly causes the CELL to perform at a tiny fraction of the special cases.

A non SIMD oriented design might allowed 50% more processors in the same space.
OR
A larger data store per CPU might significantly increase the versatility for more irregular processing ( but how much is ever enough??).

Intel is also working on multiple CPU on die (supposedly 16 CPU versions are already being tested). I havent seen the spec yet for that design, but it probably wont be any 'Solves All' solution either.


Share this post


Link to post
Share on other sites
So the idea is to get as many concurrent threads running as possible. Seems logical. But should the project be designed based on multi-threading or using multi-threading? (ie. in first case all variables are thread safe, every function separates tasks, no dependancy-chains, etc.; on second case most intensive and parallelable code are threaded.) Thanks for the reply,

JVFF

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement