Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Why would devs be opposed to the PS4`s x86 architecture?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
19 replies to this topic

#1 BloodOrange1981   Members   -  Reputation: 240

Like
1Likes
Like

Posted 02 February 2014 - 07:19 PM

This is more of an open - discussion question as opposed to a straight question-and-answer, if this is inappropriate for this forum please tell me.

In the numerous presentations and interviews regarding the genesis of the Playstation 4 Mark Cerny said he faced some resistance regarding making the PS4`s architecture based on the x86. He had to convince a number of people,including third parties, that it was "suitable for games".

PC game developers have had no choice but use x86 architecture for their games for several generations with spectacular results (despite not having the same kind of low level access to a GPU like you do on consoles) so why would people:

a) need persuading

and

b) what kind of objections would they have?



Sponsor:

#2 Nypyren   Crossbones+   -  Reputation: 4786

Like
0Likes
Like

Posted 02 February 2014 - 08:11 PM

I don't know why anyone outside of the battery-powered device market would complain about x86 architecture.  Game developers certainly won't.

 

It's likely that the resistance came from proprietary development tool providers (compilers, debuggers, profilers) who risk losing their monopoly.


Edited by Nypyren, 02 February 2014 - 08:12 PM.


#3 SeanMiddleditch   Members   -  Reputation: 7129

Like
1Likes
Like

Posted 02 February 2014 - 08:41 PM

The biggest complaint I've heard from PS3 developers is that the architecture change meant that a decade of experience, code, and tools built around the Cell's architecture and SPUs is now somewhat wasted. This is an argument specific to Sony's systems and not the XBone or PC.

#4 frob   Moderators   -  Reputation: 22692

Like
15Likes
Like

Posted 02 February 2014 - 08:57 PM


Imagine you have just spent eight years and many million dollars developing a library of code centered around the Cell architecture. Would you be very happy to hear you need to throw it away?

Imagine you have spent eight years hiring people, and your focus has been to include people who deeply understand the "supercomputer on a chip" design that Cell offered, which is most powerful when developers focus on the chip as a master processor with collection of slave processors, and now find that all those employees must go back to the x86 model. Would you be happy to hear that those employees will no longer be necessary?


The parallel architecture is a little more tricky to develop for, but considering all the supercomputers built out of old PS3s it should make you think. Building parallel algorithms where the work is partitioned across processors takes a lot more of the science aspect of programming, but the result is that you are trading serial time for parallel time and can potentially do significantly more work in the same wall-clock time. While many companies who focused on cross-platform designs take minimal advantage of the hardware, the companies who focused specifically on that system could do far more. Consider that in terms of raw floating point operations per second, X360's x86 architecture could perform 77 GFLOPS, the PS3 could perform 230 GFLOPS. It takes a bit more computer science application and system-specific coding to take advantage of it, but offering four times the raw processing power is a notable thing.

People who have spent their whole programming lives on the x86 platform don't really notice, and those who stick to single-threaded high level languages without relying on chip-specific functionally don't really notice, but the x86 family of processors really are awful compared to what else is out there. Yes the are general purpose and can do a lot, but other designs have a lot to offer. Consider how x86 does memory access: You request a single byte. That byte might or might not be in the cache, and might require a long time to load. There is no good way to request a block be fetched or maintained for frequent access. In many other processors you can map a block of memory for fast access and continuous cache use, and swap it out and replace it if you want to. The x86 family originated in the days of much slower memory and other systems were not powerful. On the x86 if you want to copy memory you might have a way to do a DMA transfer (tell the memory to copy directly from one memory point to another) but in practice that rarely happens; everything goes through the CPU. Compare this with many other systems where you can copy and transfer memory blocks in the background without having it travel across the entire motherboard. The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I write about assorted stuff.


#5 Ryan_001   Prime Members   -  Reputation: 1456

Like
0Likes
Like

Posted 02 February 2014 - 10:04 PM

The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on.

 

I think you meant to say 16x64-bit registers.  I was also under the impression that there were 16x128-bit SIMD regs (http://msdn.microsoft.com/en-us/library/windows/hardware/ff561499(v=vs.85).aspx).  But ya I generally agree that x86 is pretty terrible.  I'd love something like the itanium promised.  A rotating register file that mirrored the stack would be silly fast.  



#6 richardurich   Members   -  Reputation: 1187

Like
1Likes
Like

Posted 02 February 2014 - 10:09 PM

The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on.

x86-64 chips all have 16 named 64-bit registers and at least 16 128-bit SIMD registers (in x86-64 mode, 8 is only applicable in 32-bit legacy mode).

 

Newer x86-64 chips like Jaguar in the PS4 have 16 256-bit SIMD registers (AVX, names YMM0-15).

 

For comparison, ARMv8-A has 31 64-bit and 32 128-bit FP/NEON registers.



#7 Nypyren   Crossbones+   -  Reputation: 4786

Like
3Likes
Like

Posted 02 February 2014 - 11:30 PM

Imagine you have just spent eight years and many million dollars developing a library of code centered around the Cell architecture. Would you be very happy to hear you need to throw it away?

Imagine you have spent eight years hiring people, and your focus has been to include people who deeply understand the "supercomputer on a chip" design that Cell offered, which is most powerful when developers focus on the chip as a master processor with collection of slave processors, and now find that all those employees must go back to the x86 model. Would you be happy to hear that those employees will no longer be necessary?


The parallel architecture is a little more tricky to develop for, but considering all the supercomputers built out of old PS3s it should make you think. Building parallel algorithms where the work is partitioned across processors takes a lot more of the science aspect of programming, but the result is that you are trading serial time for parallel time and can potentially do significantly more work in the same wall-clock time.


Why would you throw away such a library? There is still good money to be made from PS3 ports of new games, just like there was of PS2/Wii ports of 360/PS3 games.

If I had a large number of employees who understand how to write PS3 games very well, the transition to x86 is not a catastrophe. The primary change that occurred with the PS3 was it forced people without multicore processing experience to learn it. The new processors are not going back to a single-core model, so the primary skills that those PS3 developers learned are still relevant.

If I were smart, I wouldn't get rid of the existing engineers at such a company. I would spend some extra time and money to get them up to speed on x86-64, just like that spent to get them experienced on the PS3. Also, after 8 years of working together, these people will have become a well-oiled machine of teamwork, and that is just as important as their technical abilities.


If the hypothetical library is written entirely in assembly, then yes, I would be bummed (for various other reasons as well - why is it entirely assembly in the first place?!). But if the majority of the library was written in C or C++ with some small sections in assembly, then I would not be bummed.

Edited by Nypyren, 02 February 2014 - 11:38 PM.


#8 Krohm   Crossbones+   -  Reputation: 3245

Like
2Likes
Like

Posted 03 February 2014 - 02:10 AM


People who have spent their whole programming lives on the x86 platform don't really notice
I have spent my life on x86, yet I feel uneasy with it. One just has to look at the details.

Example: compare the effective ALU size of a die to the control circuitry, most of it is for out-of-order execution. All the instruction dependencies could be accounted at compile time, so what is OOE useful for? To hide the memory latency I suppose... a problem that arisen exactly because there's no way to put the right amount of memory in the right place. OOE makes no sense for closed architectures where SW is heavily scrutinized.

 

OOE reloaded: Intel puts two cores into one, then figures out most threads are dependancy limited and provides HyperThreading to perform OOE across two threads. Kudos to them for the technical feat but I wonder if that's really necessary and BTW, no HT-enabled low-end processor years after introduction.

 

There's no such thing as x86: x86-Intel is a thing. x86-AMD is another. The latter is at least somewhat coherent in features. With Intel the thing is a bit more complicated (albeit we can discuss those advanced features to be currently useful).

 

So, I can absolutely understand at least one good reason for which developers didn't take this nicely. X86 is just cheap, in all senses, good and bad.



#9 Hodgman   Moderators   -  Reputation: 31786

Like
6Likes
Like

Posted 03 February 2014 - 03:19 AM

Short answer (elaborated upon above) is that you can get more FLOPS per transistor with a custom architecture than with a complex, slowly-evolved arch like x86.

The SPUs were absolutely phenomenal for performance, especially in 2006!
If they'd followed the same path, they could've made a CPU with far more FLOPS, while keeping the same size/transistor-count/heat/voltage/production-cost.

The upside of x86 is that it's easier to develop for, because it's what we use on our workstations.

I'm guessing that the awesome PS3 CPU model (traditional CPU core paired with SPUs) was abandoned because we now have compute-shaders, which are almost the same thing.
The extra compute cores in the PS4 (over the Xbone) have about the same FLOPS output as the PS3s SPUs, which is interesting.

#10 phantom   Moderators   -  Reputation: 7556

Like
2Likes
Like

Posted 03 February 2014 - 04:35 AM

I'm guessing that the awesome PS3 CPU model (traditional CPU core paired with SPUs) was abandoned because we now have compute-shaders, which are almost the same thing.


Except for all the things you lose and the extra 'baggage' of 'must have 64 units of work in flight or you are wasting alu', must put thousands of work units in flight to hide latency (which you can't explicately cover with alu ops) and all manner of other things which makes me sad sad.png

Ideal layout;
- few 'fat' cores
- 16 or so SPUs with more on-die cache (double PS3 basically at least)
- DX11 class gpu

You get things which are good at branchy code, things which are good at moderately branch/simd friendly code and things which are good at highly coherant large workset code.

I'd be all smile.png then.

As it is...
*sigh*

#11 KaiserJohan   Members   -  Reputation: 1233

Like
0Likes
Like

Posted 03 February 2014 - 05:28 AM

Anyone care to explain "FLOPS"? Never heard that terminology before.



#12 Hodgman   Moderators   -  Reputation: 31786

Like
1Likes
Like

Posted 03 February 2014 - 05:42 AM

I'm guessing that the awesome PS3 CPU model (traditional CPU core paired with SPUs) was abandoned because we now have compute-shaders, which are almost the same thing.

Except for all the things you lose and the extra 'baggage' of 'must have 64 units of work in flight or you are wasting alu', must put thousands of work units in flight to hide latency (which you can't explicately cover with alu ops) and all manner of other things which makes me sad sad.png

I didn't say it was the right decision, but I can kind of understand that hypothetical reasoning.

SPU's do well at "compute" workloads, but yes, SPUs also do well at other types of workloads that "compute" hardware doesn't handle too well.
When I read the Cell documentation, I was pretty convinced that it was going to be the future of CPU's... maybe next, next-gen, again?

Anyone care to explain "FLOPS"? Never heard that terminology before.

Basically, math performance: http://en.wikipedia.org/wiki/FLOPS



#13 TheChubu   Crossbones+   -  Reputation: 4754

Like
0Likes
Like

Posted 03 February 2014 - 07:26 AM

Floating point operations per second. FLOPS.

 

EDIT: Ninja'd.


Edited by TheChubu, 03 February 2014 - 07:26 AM.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#14 Ameise   Members   -  Reputation: 766

Like
2Likes
Like

Posted 03 February 2014 - 11:12 AM


X360's x86 architecture could perform 77 GFLOPS, the PS3 could perform 230 GFLOPS.


Just to nitpick, but the XBox 360 didn't have an x86 processor - it was a PowerPC just like the PS3 but with a different internal architecture. Perhaps you meant that the design of the Xenon CPU is more similar to a PowerPC version of common x86 chips than it is to the distributed design of the Core CPU?


Edited by Ameise, 03 February 2014 - 11:17 AM.


#15 Karsten_   Members   -  Reputation: 1655

Like
0Likes
Like

Posted 03 February 2014 - 02:48 PM

Well this generation of games consoles isn't about the hardware. It is all about locking players down into the "cloud" and other online consumer services.

Any hardware can be used for this quite frankly.

 

I wonder if the use of slightly less exotic hardware will encourage or discourage the homebrew scene. Afterall, I believe the reason why we still don't yet have a playable emulator for the first Xbox is because "it's boring".

 

I find it most interesting that both heavyweights decided to choose this type of hardware at the same time. It is almost like as soon as they realized it would not be "embarrassing" to do so, they jumped on board the x86 bandwagon immediately.


Edited by Karsten_, 03 February 2014 - 02:50 PM.

Mutiny - Open-source C++ Unity re-implementation.
Defile of Eden 2 - FreeBSD and OpenBSD binaries of our latest game.


#16 richardurich   Members   -  Reputation: 1187

Like
0Likes
Like

Posted 03 February 2014 - 03:27 PM

I find it most interesting that both heavyweights decided to choose this type of hardware at the same time. It is almost like as soon as they realized it would not be "embarrassing" to do so, they jumped on board the x86 bandwagon immediately.

Power has lagged in R&D dollars, and also lacks SoC with a good GPU, so both vendors had to rule them out. MIPS also lacked a good GPU solution, and doesn't have as large a developer pool. ARM and x86 were the remaining choices, and rumor has it both got to prototype silicon. ARM only released ARMv8-A in 2011, so that might have even been too risky since the consoles obviously wanted 64-bit chips and that's a pretty tight schedule. But x86 apparently had the better performance on the synthetic tests anyways, so it probably wasn't a hard choice.



#17 Ravyne   GDNet+   -  Reputation: 8068

Like
1Likes
Like

Posted 03 February 2014 - 05:38 PM


Except for all the things you lose and the extra 'baggage' of 'must have 64 units of work in flight or you are wasting alu', must put thousands of work units in flight to hide latency (which you can't explicately cover with alu ops) and all manner of other things which makes me sad

 

AFAIK, based on public comment, both new-generation consoles support the equivilent of AMD's HSA -- that is, a single physical address space for CPU and GPU processes. This eliminates a lot of the overhead associated with shuffling less memory-or-compute-intensive processes over to the GPU from CPU-land.

 

In GPGPU on the PC, before HSA, the process is like this: You have some input data you want to process and a kernel to process it->You make an API call that does some CPU-side security/validity checks and then forwards the request to the driver->The driver makes a shadow copy of the data into a block of memory that's assigned to the driver and changes the data alignment to be suitable for the GPU's DMA unit, then queues the DMA request->The GPU comes along and processes the DMA request, which copies the data from the shadow buffer into the GPU's physical memory over the (slow) PCIe bus along with the compute kernel->After the results are computed, if the data is needed back in CPU-land, the GPU has to DMA them back over PCIe and into the aligned shadow buffer->And finally, before the CPU can access them, the driver has to copy them back from the shadow copy in the driver's address space to the process address space.

 

HSA, and the new-generation consoles, are able to skip all the copying, shadow-buffers, DMA and PCIe bus stuff entirely. Moving the data practically free, and IIRC about equivalent to remapping a memory page in the worst-case scenario.

 

Having wide execution units running over code that diverges is still a problem--its always more difficult to keep many threads in sync than fewer, but traditional SIMD ought to be a reasonable substitute for that.

 

I will say though, that the trend of 8 "light" Jaguar cores was surprising to me too -- I very much expected to see 4 "fat" cores this generation instead. I worry that there will be scenarios that will bottle-neck on single-threaded performance, and which will be a challenge to retool in a thread-friendly manner.



#18 phantom   Moderators   -  Reputation: 7556

Like
0Likes
Like

Posted 04 February 2014 - 01:04 AM

AFAIK, based on public comment, both new-generation consoles support the equivilent of AMD's HSA -- that is, a single physical address space for CPU and GPU processes. This eliminates a lot of the overhead associated with shuffling less memory-or-compute-intensive processes over to the GPU from CPU-land.


Which isn't anything new in the console space; X360 you could fiddle with GPU visable memory from the CPU and on the PS3 the SPUs could touch both system and VRAM components with ease (and the GPU could pull from system, althought that was a bit slow). HSA might be Big News in the desktop world but aside from a bit of fiddling on startup with memory pages/addresses it's pretty old hat on the consoles.

None of which sidesteps the issues of going from SPUs to compute I mentioned;
- a single SPU could chew through plenty of work on it's own, but launch a work group on the GPU with less than 64 threads and whoops, there goes tons of ALU time on that CU and unless you launch multiple groups of multiples of 64 work units (or have enough work groups in flight on the CU) then you can't do latency hiding for memory access...
- which brings me to point 2; SPUs let you issue DMA requests from main to 'local' memory and then work there. The nice thing was you can kick off a number of these requests up front, do some ALU work and then wait until data turns up to continue getting effectively free ALU cycles. I've used this to great effect doing SH on the PS3 where as soon as possible I'd issue a DMA load for data before doing non-dependant ops so that, by the time I needed the data, it was loaded (or nearly loaded).

Which brings up a fun aspect of them; you knew you had X amount of memory to play with and could manage it how you wanted. You could take advantage of memory space wrapping, alasing buffers over each other (I need this 8K now, but once I'm done I'll need 4K for something else, I've some time in between so I'll just reuse that chunk) and generally knowing what it is you want to do.

SPUs, while seeming complicated, where great things and as much as I'm a GPU Compute fan for large parallel workloads the fact is there are workloads just 'throwing it at the gpu' doesn't suit and something which is a halfway house between a CPU and GPU is perfectly suited.

#19 MJP   Moderators   -  Reputation: 11736

Like
2Likes
Like

Posted 04 February 2014 - 02:54 AM


Imagine you have just spent eight years and many million dollars developing a library of code centered around the Cell architecture. Would you be very happy to hear you need to throw it away?

 

Why would you throw it away? Sure some parts of it may be SPU-specific, but paralleizable code that work on small chunks of contiguous data is great on almost any architecture. Most studios aren't going to be doing it Naughty Dog style with piles of hand-written pipelined SPU assembly. Which of course makes sense why Mark said that it was first-parties that were skeptical of x86.

 

 


Imagine you have spent eight years hiring people, and your focus has been to include people who deeply understand the "supercomputer on a chip" design that Cell offered, which is most powerful when developers focus on the chip as a master processor with collection of slave processors, and now find that all those employees must go back to the x86 model. Would you be happy to hear that those employees will no longer be necessary?



So what, these mighty geniuses of programming are suddenly useless when given 8 x86 cores and a GPU? GPU compute in games is a largely unexplored field, and it's going to require smart people to figure out the best way to make use of it. And engines always need people that can take a pile of spaghetti gameplay code and turn it into something that can run across multiple cores without a billion cache misses.

 

 

 


The parallel architecture is a little more tricky to develop for, but considering all the supercomputers built out of old PS3s it should make you think. Building parallel algorithms where the work is partitioned across processors takes a lot more of the science aspect of programming, but the result is that you are trading serial time for parallel time and can potentially do significantly more work in the same wall-clock time. While many companies who focused on cross-platform designs take minimal advantage of the hardware, the companies who focused specifically on that system could do far more. Consider that in terms of raw floating point operations per second, X360's x86 architecture could perform 77 GFLOPS, the PS3 could perform 230 GFLOPS. It takes a bit more computer science application and system-specific coding to take advantage of it, but offering four times the raw processing power is a notable thing.

And yet for all of that processing power X360 games regularly outperformed their PS3 versions. What good is oodles of flops if it's not accessible to the average dev team? I for one am glad that Sony decided to take their head of the sand on this issue, and instead doubled down on making a system that made its power available to developers instead of hiding it away from them.

 

 

 


People who have spent their whole programming lives on the x86 platform don't really notice, and those who stick to single-threaded high level languages without relying on chip-specific functionally don't really notice, but the x86 family of processors really are awful compared to what else is out there. Yes the are general purpose and can do a lot, but other designs have a lot to offer. Consider how x86 does memory access: You request a single byte. That byte might or might not be in the cache, and might require a long time to load. There is no good way to request a block be fetched or maintained for frequent access. In many other processors you can map a block of memory for fast access and continuous cache use, and swap it out and replace it if you want to. The x86 family originated in the days of much slower memory and other systems were not powerful. On the x86 if you want to copy memory you might have a way to do a DMA transfer (tell the memory to copy directly from one memory point to another) but in practice that rarely happens; everything goes through the CPU. Compare this with many other systems where you can copy and transfer memory blocks in the background without having it travel across the entire motherboard. The very small number of CPU registers on x86 was often derided with it's paltry 8 registers, then later it's 16 registers up until the 64-bit extensions brought it up to a barely respectable 64 64-bit registers and 8 128-bit SIMD registers; competitors during the 16-register introduction often had 32 32-bit registers, and when the 64-bit extensions were introduced competitors offered 32-64-bit and 32-128-bit registers or more; in some cases offering 128+ 128-bit registers for your processing enjoyment. The x86 64-bit extensions helped ease a lot of stresses, but the x86 family at the assembly level absolutely shows its age since the core instructions are still based around the hardware concepts from the early 1970s rather than the physical realities of today's hardware. And on and on and on


Sure x86 is old and crusty, but that doesn't mean that AMD's x86 chips are necessarily bad because of it. In the context of the PS4's design constraints I can hardly see how it was a bad choice. Their box would be more expensive and would use more power if they'd stuck another Cell-like monster in a separate die instead of going with their SOC solution. 


Edited by MJP, 04 February 2014 - 02:55 AM.


#20 phantom   Moderators   -  Reputation: 7556

Like
2Likes
Like

Posted 04 February 2014 - 04:44 AM

And yet for all of that processing power X360 games regularly outperformed their PS3 versions.


To be fair the SPUs were often used to make up for the utter garbage which was the PS3's GPU... god that thing sucked.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS