Why would devs be opposed to the PS4`s x86 architecture?

Started by
18 comments, last by _the_phantom_ 10 years, 2 months ago

This is more of an open - discussion question as opposed to a straight question-and-answer, if this is inappropriate for this forum please tell me.

In the numerous presentations and interviews regarding the genesis of the Playstation 4 Mark Cerny said he faced some resistance regarding making the PS4`s architecture based on the x86. He had to convince a number of people,including third parties, that it was "suitable for games".

PC game developers have had no choice but use x86 architecture for their games for several generations with spectacular results (despite not having the same kind of low level access to a GPU like you do on consoles) so why would people:

a) need persuading

and

b) what kind of objections would they have?

Advertisement

I don't know why anyone outside of the battery-powered device market would complain about x86 architecture. Game developers certainly won't.

It's likely that the resistance came from proprietary development tool providers (compilers, debuggers, profilers) who risk losing their monopoly.

The biggest complaint I've heard from PS3 developers is that the architecture change meant that a decade of experience, code, and tools built around the Cell's architecture and SPUs is now somewhat wasted. This is an argument specific to Sony's systems and not the XBone or PC.

Sean Middleditch – Game Systems Engineer – Join my team!


Imagine you have just spent eight years and many million dollars developing a library of code centered around the Cell architecture. Would you be very happy to hear you need to throw it away?

Imagine you have spent eight years hiring people, and your focus has been to include people who deeply understand the "supercomputer on a chip" design that Cell offered, which is most powerful when developers focus on the chip as a master processor with collection of slave processors, and now find that all those employees must go back to the x86 model. Would you be happy to hear that those employees will no longer be necessary?


The parallel architecture is a little more tricky to develop for, but considering all the supercomputers built out of old PS3s it should make you think. Building parallel algorithms where the work is partitioned across processors takes a lot more of the science aspect of programming, but the result is that you are trading serial time for parallel time and can potentially do significantly more work in the same wall-clock time. While many companies who focused on cross-platform designs take minimal advantage of the hardware, the companies who focused specifically on that system could do far more. Consider that in terms of raw floating point operations per second, X360's x86 architecture could perform 77 GFLOPS, the PS3 could perform 230 GFLOPS. It takes a bit more computer science application and system-specific coding to take advantage of it, but offering four times the raw processing power is a notable thing.

People who have spent their whole programming lives on the x86 platform don't really notice, and those who stick to single-threaded high level languages without relying on chip-specific functionally don't really notice, but the x86 family of processors really are awful compared to what else is out there. Yes the are general purpose and can do a lot, but other designs have a lot to offer. Consider how x86 does memory access: You request a single byte. That byte might or might not be in the cache, and might require a long time to load. There is no good way to request a block be fetched or maintained for frequent access. In many other processors you can map a block of memory for fast access and continuous cache use, and swap it out and replace it if you want to. The x86 family originated in the days of much slower memory and other systems were not powerful. On the x86 if you want to copy memory you might have a way to do a DMA transfer (tell the memory to copy directly from one memory point to another) but in practice that rarely happens; everything goes through the CPU. Compare this with many other systems where you can copy and transfer memory blocks in the background without having it travel across the entire motherboard. The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on.

The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on.

I think you meant to say 16x64-bit registers. I was also under the impression that there were 16x128-bit SIMD regs (http://msdn.microsoft.com/en-us/library/windows/hardware/ff561499(v=vs.85).aspx). But ya I generally agree that x86 is pretty terrible. I'd love something like the itanium promised. A rotating register file that mirrored the stack would be silly fast.

The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on.

x86-64 chips all have 16 named 64-bit registers and at least 16 128-bit SIMD registers (in x86-64 mode, 8 is only applicable in 32-bit legacy mode).

Newer x86-64 chips like Jaguar in the PS4 have 16 256-bit SIMD registers (AVX, names YMM0-15).

For comparison, ARMv8-A has 31 64-bit and 32 128-bit FP/NEON registers.

Imagine you have just spent eight years and many million dollars developing a library of code centered around the Cell architecture. Would you be very happy to hear you need to throw it away?

Imagine you have spent eight years hiring people, and your focus has been to include people who deeply understand the "supercomputer on a chip" design that Cell offered, which is most powerful when developers focus on the chip as a master processor with collection of slave processors, and now find that all those employees must go back to the x86 model. Would you be happy to hear that those employees will no longer be necessary?


The parallel architecture is a little more tricky to develop for, but considering all the supercomputers built out of old PS3s it should make you think. Building parallel algorithms where the work is partitioned across processors takes a lot more of the science aspect of programming, but the result is that you are trading serial time for parallel time and can potentially do significantly more work in the same wall-clock time.


Why would you throw away such a library? There is still good money to be made from PS3 ports of new games, just like there was of PS2/Wii ports of 360/PS3 games.

If I had a large number of employees who understand how to write PS3 games very well, the transition to x86 is not a catastrophe. The primary change that occurred with the PS3 was it forced people without multicore processing experience to learn it. The new processors are not going back to a single-core model, so the primary skills that those PS3 developers learned are still relevant.

If I were smart, I wouldn't get rid of the existing engineers at such a company. I would spend some extra time and money to get them up to speed on x86-64, just like that spent to get them experienced on the PS3. Also, after 8 years of working together, these people will have become a well-oiled machine of teamwork, and that is just as important as their technical abilities.


If the hypothetical library is written entirely in assembly, then yes, I would be bummed (for various other reasons as well - why is it entirely assembly in the first place?!). But if the majority of the library was written in C or C++ with some small sections in assembly, then I would not be bummed.


People who have spent their whole programming lives on the x86 platform don't really notice
I have spent my life on x86, yet I feel uneasy with it. One just has to look at the details.

Example: compare the effective ALU size of a die to the control circuitry, most of it is for out-of-order execution. All the instruction dependencies could be accounted at compile time, so what is OOE useful for? To hide the memory latency I suppose... a problem that arisen exactly because there's no way to put the right amount of memory in the right place. OOE makes no sense for closed architectures where SW is heavily scrutinized.

OOE reloaded: Intel puts two cores into one, then figures out most threads are dependancy limited and provides HyperThreading to perform OOE across two threads. Kudos to them for the technical feat but I wonder if that's really necessary and BTW, no HT-enabled low-end processor years after introduction.

There's no such thing as x86: x86-Intel is a thing. x86-AMD is another. The latter is at least somewhat coherent in features. With Intel the thing is a bit more complicated (albeit we can discuss those advanced features to be currently useful).

So, I can absolutely understand at least one good reason for which developers didn't take this nicely. X86 is just cheap, in all senses, good and bad.

Previously "Krohm"

Short answer (elaborated upon above) is that you can get more FLOPS per transistor with a custom architecture than with a complex, slowly-evolved arch like x86.

The SPUs were absolutely phenomenal for performance, especially in 2006!
If they'd followed the same path, they could've made a CPU with far more FLOPS, while keeping the same size/transistor-count/heat/voltage/production-cost.

The upside of x86 is that it's easier to develop for, because it's what we use on our workstations.

I'm guessing that the awesome PS3 CPU model (traditional CPU core paired with SPUs) was abandoned because we now have compute-shaders, which are almost the same thing.
The extra compute cores in the PS4 (over the Xbone) have about the same FLOPS output as the PS3s SPUs, which is interesting.

I'm guessing that the awesome PS3 CPU model (traditional CPU core paired with SPUs) was abandoned because we now have compute-shaders, which are almost the same thing.


Except for all the things you lose and the extra 'baggage' of 'must have 64 units of work in flight or you are wasting alu', must put thousands of work units in flight to hide latency (which you can't explicately cover with alu ops) and all manner of other things which makes me sad sad.png

Ideal layout;
- few 'fat' cores
- 16 or so SPUs with more on-die cache (double PS3 basically at least)
- DX11 class gpu

You get things which are good at branchy code, things which are good at moderately branch/simd friendly code and things which are good at highly coherant large workset code.

I'd be all smile.png then.

As it is...
*sigh*

This topic is closed to new replies.

Advertisement