• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
BloodOrange1981

Why would devs be opposed to the PS4`s x86 architecture?

19 posts in this topic

This is more of an open - discussion question as opposed to a straight question-and-answer, if this is inappropriate for this forum please tell me.

In the numerous presentations and interviews regarding the genesis of the Playstation 4 Mark Cerny said he faced some resistance regarding making the PS4`s architecture based on the x86. He had to convince a number of people,including third parties, that it was "suitable for games".

PC game developers have had no choice but use x86 architecture for their games for several generations with spectacular results (despite not having the same kind of low level access to a GPU like you do on consoles) so why would people:

a) need persuading

and

b) what kind of objections would they have?

1

Share this post


Link to post
Share on other sites

I don't know why anyone outside of the battery-powered device market would complain about x86 architecture.  Game developers certainly won't.

 

It's likely that the resistance came from proprietary development tool providers (compilers, debuggers, profilers) who risk losing their monopoly.

Edited by Nypyren
0

Share this post


Link to post
Share on other sites
The biggest complaint I've heard from PS3 developers is that the architecture change meant that a decade of experience, code, and tools built around the Cell's architecture and SPUs is now somewhat wasted. This is an argument specific to Sony's systems and not the XBone or PC.
1

Share this post


Link to post
Share on other sites

The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on.

 

I think you meant to say 16x64-bit registers.  I was also under the impression that there were 16x128-bit SIMD regs (http://msdn.microsoft.com/en-us/library/windows/hardware/ff561499(v=vs.85).aspx).  But ya I generally agree that x86 is pretty terrible.  I'd love something like the itanium promised.  A rotating register file that mirrored the stack would be silly fast.  

0

Share this post


Link to post
Share on other sites

The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on.

x86-64 chips all have 16 named 64-bit registers and at least 16 128-bit SIMD registers (in x86-64 mode, 8 is only applicable in 32-bit legacy mode).

 

Newer x86-64 chips like Jaguar in the PS4 have 16 256-bit SIMD registers (AVX, names YMM0-15).

 

For comparison, ARMv8-A has 31 64-bit and 32 128-bit FP/NEON registers.

1

Share this post


Link to post
Share on other sites


People who have spent their whole programming lives on the x86 platform don't really notice
I have spent my life on x86, yet I feel uneasy with it. One just has to look at the details.

Example: compare the effective ALU size of a die to the control circuitry, most of it is for out-of-order execution. All the instruction dependencies could be accounted at compile time, so what is OOE useful for? To hide the memory latency I suppose... a problem that arisen exactly because there's no way to put the right amount of memory in the right place. OOE makes no sense for closed architectures where SW is heavily scrutinized.

 

OOE reloaded: Intel puts two cores into one, then figures out most threads are dependancy limited and provides HyperThreading to perform OOE across two threads. Kudos to them for the technical feat but I wonder if that's really necessary and BTW, no HT-enabled low-end processor years after introduction.

 

There's no such thing as x86: x86-Intel is a thing. x86-AMD is another. The latter is at least somewhat coherent in features. With Intel the thing is a bit more complicated (albeit we can discuss those advanced features to be currently useful).

 

So, I can absolutely understand at least one good reason for which developers didn't take this nicely. X86 is just cheap, in all senses, good and bad.

2

Share this post


Link to post
Share on other sites

I'm guessing that the awesome PS3 CPU model (traditional CPU core paired with SPUs) was abandoned because we now have compute-shaders, which are almost the same thing.


Except for all the things you lose and the extra 'baggage' of 'must have 64 units of work in flight or you are wasting alu', must put thousands of work units in flight to hide latency (which you can't explicately cover with alu ops) and all manner of other things which makes me sad sad.png

Ideal layout;
- few 'fat' cores
- 16 or so SPUs with more on-die cache (double PS3 basically at least)
- DX11 class gpu

You get things which are good at branchy code, things which are good at moderately branch/simd friendly code and things which are good at highly coherant large workset code.

I'd be all smile.png then.

As it is...
*sigh*
2

Share this post


Link to post
Share on other sites

I'm guessing that the awesome PS3 CPU model (traditional CPU core paired with SPUs) was abandoned because we now have compute-shaders, which are almost the same thing.

Except for all the things you lose and the extra 'baggage' of 'must have 64 units of work in flight or you are wasting alu', must put thousands of work units in flight to hide latency (which you can't explicately cover with alu ops) and all manner of other things which makes me sad sad.png

I didn't say it was the right decision, but I can kind of understand that hypothetical reasoning.

SPU's do well at "compute" workloads, but yes, SPUs also do well at other types of workloads that "compute" hardware doesn't handle too well.
When I read the Cell documentation, I was pretty convinced that it was going to be the future of CPU's... maybe next, next-gen, again?

Anyone care to explain "FLOPS"? Never heard that terminology before.

Basically, math performance: http://en.wikipedia.org/wiki/FLOPS

1

Share this post


Link to post
Share on other sites

X360's x86 architecture could perform 77 GFLOPS, the PS3 could perform 230 GFLOPS.


Just to nitpick, but the XBox 360 didn't have an x86 processor - it was a PowerPC just like the PS3 but with a different internal architecture. Perhaps you meant that the design of the Xenon CPU is more similar to a PowerPC version of common x86 chips than it is to the distributed design of the Core CPU?

Edited by Ameise
2

Share this post


Link to post
Share on other sites

Well this generation of games consoles isn't about the hardware. It is all about locking players down into the "cloud" and other online consumer services.

Any hardware can be used for this quite frankly.

 

I wonder if the use of slightly less exotic hardware will encourage or discourage the homebrew scene. Afterall, I believe the reason why we still don't yet have a playable emulator for the first Xbox is because "it's boring".

 

I find it most interesting that both heavyweights decided to choose this type of hardware at the same time. It is almost like as soon as they realized it would not be "embarrassing" to do so, they jumped on board the x86 bandwagon immediately.

Edited by Karsten_
0

Share this post


Link to post
Share on other sites

I find it most interesting that both heavyweights decided to choose this type of hardware at the same time. It is almost like as soon as they realized it would not be "embarrassing" to do so, they jumped on board the x86 bandwagon immediately.

Power has lagged in R&D dollars, and also lacks SoC with a good GPU, so both vendors had to rule them out. MIPS also lacked a good GPU solution, and doesn't have as large a developer pool. ARM and x86 were the remaining choices, and rumor has it both got to prototype silicon. ARM only released ARMv8-A in 2011, so that might have even been too risky since the consoles obviously wanted 64-bit chips and that's a pretty tight schedule. But x86 apparently had the better performance on the synthetic tests anyways, so it probably wasn't a hard choice.

0

Share this post


Link to post
Share on other sites


Except for all the things you lose and the extra 'baggage' of 'must have 64 units of work in flight or you are wasting alu', must put thousands of work units in flight to hide latency (which you can't explicately cover with alu ops) and all manner of other things which makes me sad

 

AFAIK, based on public comment, both new-generation consoles support the equivilent of AMD's HSA -- that is, a single physical address space for CPU and GPU processes. This eliminates a lot of the overhead associated with shuffling less memory-or-compute-intensive processes over to the GPU from CPU-land.

 

In GPGPU on the PC, before HSA, the process is like this: You have some input data you want to process and a kernel to process it->You make an API call that does some CPU-side security/validity checks and then forwards the request to the driver->The driver makes a shadow copy of the data into a block of memory that's assigned to the driver and changes the data alignment to be suitable for the GPU's DMA unit, then queues the DMA request->The GPU comes along and processes the DMA request, which copies the data from the shadow buffer into the GPU's physical memory over the (slow) PCIe bus along with the compute kernel->After the results are computed, if the data is needed back in CPU-land, the GPU has to DMA them back over PCIe and into the aligned shadow buffer->And finally, before the CPU can access them, the driver has to copy them back from the shadow copy in the driver's address space to the process address space.

 

HSA, and the new-generation consoles, are able to skip all the copying, shadow-buffers, DMA and PCIe bus stuff entirely. Moving the data practically free, and IIRC about equivalent to remapping a memory page in the worst-case scenario.

 

Having wide execution units running over code that diverges is still a problem--its always more difficult to keep many threads in sync than fewer, but traditional SIMD ought to be a reasonable substitute for that.

 

I will say though, that the trend of 8 "light" Jaguar cores was surprising to me too -- I very much expected to see 4 "fat" cores this generation instead. I worry that there will be scenarios that will bottle-neck on single-threaded performance, and which will be a challenge to retool in a thread-friendly manner.

1

Share this post


Link to post
Share on other sites

AFAIK, based on public comment, both new-generation consoles support the equivilent of AMD's HSA -- that is, a single physical address space for CPU and GPU processes. This eliminates a lot of the overhead associated with shuffling less memory-or-compute-intensive processes over to the GPU from CPU-land.


Which isn't anything new in the console space; X360 you could fiddle with GPU visable memory from the CPU and on the PS3 the SPUs could touch both system and VRAM components with ease (and the GPU could pull from system, althought that was a bit slow). HSA might be Big News in the desktop world but aside from a bit of fiddling on startup with memory pages/addresses it's pretty old hat on the consoles.

None of which sidesteps the issues of going from SPUs to compute I mentioned;
- a single SPU could chew through plenty of work on it's own, but launch a work group on the GPU with less than 64 threads and whoops, there goes tons of ALU time on that CU and unless you launch multiple groups of multiples of 64 work units (or have enough work groups in flight on the CU) then you can't do latency hiding for memory access...
- which brings me to point 2; SPUs let you issue DMA requests from main to 'local' memory and then work there. The nice thing was you can kick off a number of these requests up front, do some ALU work and then wait until data turns up to continue getting effectively free ALU cycles. I've used this to great effect doing SH on the PS3 where as soon as possible I'd issue a DMA load for data before doing non-dependant ops so that, by the time I needed the data, it was loaded (or nearly loaded).

Which brings up a fun aspect of them; you knew you had X amount of memory to play with and could manage it how you wanted. You could take advantage of memory space wrapping, alasing buffers over each other (I need this 8K now, but once I'm done I'll need 4K for something else, I've some time in between so I'll just reuse that chunk) and generally knowing what it is you want to do.

SPUs, while seeming complicated, where great things and as much as I'm a GPU Compute fan for large parallel workloads the fact is there are workloads just 'throwing it at the gpu' doesn't suit and something which is a halfway house between a CPU and GPU is perfectly suited.
0

Share this post


Link to post
Share on other sites

Imagine you have just spent eight years and many million dollars developing a library of code centered around the Cell architecture. Would you be very happy to hear you need to throw it away?

 

Why would you throw it away? Sure some parts of it may be SPU-specific, but paralleizable code that work on small chunks of contiguous data is great on almost any architecture. Most studios aren't going to be doing it Naughty Dog style with piles of hand-written pipelined SPU assembly. Which of course makes sense why Mark said that it was first-parties that were skeptical of x86.

 

 


Imagine you have spent eight years hiring people, and your focus has been to include people who deeply understand the "supercomputer on a chip" design that Cell offered, which is most powerful when developers focus on the chip as a master processor with collection of slave processors, and now find that all those employees must go back to the x86 model. Would you be happy to hear that those employees will no longer be necessary?



So what, these mighty geniuses of programming are suddenly useless when given 8 x86 cores and a GPU? GPU compute in games is a largely unexplored field, and it's going to require smart people to figure out the best way to make use of it. And engines always need people that can take a pile of spaghetti gameplay code and turn it into something that can run across multiple cores without a billion cache misses.

 

 

 


The parallel architecture is a little more tricky to develop for, but considering all the supercomputers built out of old PS3s it should make you think. Building parallel algorithms where the work is partitioned across processors takes a lot more of the science aspect of programming, but the result is that you are trading serial time for parallel time and can potentially do significantly more work in the same wall-clock time. While many companies who focused on cross-platform designs take minimal advantage of the hardware, the companies who focused specifically on that system could do far more. Consider that in terms of raw floating point operations per second, X360's x86 architecture could perform 77 GFLOPS, the PS3 could perform 230 GFLOPS. It takes a bit more computer science application and system-specific coding to take advantage of it, but offering four times the raw processing power is a notable thing.

And yet for all of that processing power X360 games regularly outperformed their PS3 versions. What good is oodles of flops if it's not accessible to the average dev team? I for one am glad that Sony decided to take their head of the sand on this issue, and instead doubled down on making a system that made its power available to developers instead of hiding it away from them.

 

 

 


People who have spent their whole programming lives on the x86 platform don't really notice, and those who stick to single-threaded high level languages without relying on chip-specific functionally don't really notice, but the x86 family of processors really are awful compared to what else is out there. Yes the are general purpose and can do a lot, but other designs have a lot to offer. Consider how x86 does memory access: You request a single byte. That byte might or might not be in the cache, and might require a long time to load. There is no good way to request a block be fetched or maintained for frequent access. In many other processors you can map a block of memory for fast access and continuous cache use, and swap it out and replace it if you want to. The x86 family originated in the days of much slower memory and other systems were not powerful. On the x86 if you want to copy memory you might have a way to do a DMA transfer (tell the memory to copy directly from one memory point to another) but in practice that rarely happens; everything goes through the CPU. Compare this with many other systems where you can copy and transfer memory blocks in the background without having it travel across the entire motherboard. The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on


Sure x86 is old and crusty, but that doesn't mean that AMD's x86 chips are necessarily bad because of it. In the context of the PS4's design constraints I can hardly see how it was a bad choice. Their box would be more expensive and would use more power if they'd stuck another Cell-like monster in a separate die instead of going with their SOC solution. 

Edited by MJP
2

Share this post


Link to post
Share on other sites

And yet for all of that processing power X360 games regularly outperformed their PS3 versions.


To be fair the SPUs were often used to make up for the utter garbage which was the PS3's GPU... god that thing sucked.
2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0