Believe reports about Next Gen Computer and API Performance?

14,306

February 25, 2015 06:51 AM

What I don't agree with is the many fanboys postulating that D3D12 is going to bring similarly large performance benefits to Xbox One, which is something I see all-over places where armchair developers congregate. It'll be positive for many reasons, some of them (smaller) performance reasons, even -- but D3D12 won't be the end of PS4's performance advantage.
You'd need some awesome API design for making up for that hardware difference

They're not that far apart if you ignore those pesky 4 extra graphics clusters that are reserved for compute (and also the fact that if you're rendering pipeline needs/can use compute up front (IIRC, some renderers do this to compute lighting in deferred rendering) then they're still being used.

Otherwise its 12 clusters at 850Mz vs 14 clusters at 800Mz. Overall, the difference is less than 10% (again, ignoring the other 4 compute clusters, which is a rosey view). Its not so much the shader throughput that's holding Xbox One back, as much as it is poor utilization of the ESRAM. Honestly, though, when MS found out they were going to come up short, they probably should have enabled the extra two backup clusters and taken the hit on yield early on, and re-reved the silicon to add two more to keep yields up -- doing so would have closed the gap and then some, but they might've looked at that and decided it would be bottlenecked by other resources, or was impossible by design. It also might have been too hard on yield, because you'd need completely flawless chips since the ESRAM doesn't have any redundancy/mitigation in its design at all. The PS4 chip also has two extra clusters, but it'd also put Sony in the same boat of needing flawless chips and would hurt their yields similarly, which they wouldn't have felt pressure to do.

I think Sony came out with the better architecture this round, they actually have slightly-lower total memory bandwidth, but putting it to use well doesn't require careful management like the Xbox One does. If a game does use the ESRAM well, it'll actually have more bandwidth available than PS4, but we're talking about achieving near-peak efficiency. Had the Xbox One had GDDR5 *and* ESRAM, that might have been something fancy.

Next Gen, we'll all be on HBM or similar with greater than 1TB/s bandwidth, 2TB/s probably.

throw table_exception("(? ???)? ? ???");

Hodgman

52,717

February 26, 2015 05:04 AM

They're not that far apart if you ignore those pesky 4 extra graphics clusters that are reserved for compute

It's 12 CU's vs 18 CU's, and I've never heard of 4/18 of the PS4's being reserved for compute shader usage only (???).
Even naively accounting for the clock speed difference, PS4 has a GPU compute power advantage of 41% That is a big enough gap to matter, a lot.

The XB's higher clock rate doesn't actually help that much anyway -- as shader authors need to balance ALU clocks vs memory latency. Xbone's memory actually has longer latencies than the PS4's (despite all the fanboi DDR3>GDDR5 latency FUD)... Which means that you've already got more ALU time to spare anyway, as you're spending longer waiting for memoru... Slow memory == more ALU clocks you can use "for free" per memory request == already likely to have more ALU than you need, so there's little benefit in overclocking to get even more ALU cycles!! What you really want to be overclocking is the memory controller... or just using faster RAM.

The 32MiB ESRAM is a curse as well as a cure. The DDR3 is simply not fast enough... but 99% of your graphics resources have to live in it anyway! You then have to figure out how/what/when to page data in and out of this tiny buffer so that you don't suffer as much DDR3 pain

Last gen, The Xb360 had a much stronger GPU than the PS3, but we still often see Xb360 games running at sub-720p resolutions, mostly because they can't fit into the 360's tiny 10MiB EDRAM buffer at full 720p.
e.g. a deferred renderer with 2 GBuffer layers + depth/stencil doesn't fit at 1280*720 (10.5MiB), but, 1152*720 (~9.5MiB) does just barely fit in the 10MB limit (ignoring HiZ allocations, alignment requirements, etc).
Likewise on the new generation with the 32MiB ESRAM --
A modern deferred renderer with 4 GBuffer layers + depth + stencil doesn't fit at 1920*1080 (~41.5MiB), but, 1600*900 (~28.84MiB) does.

So the trend of Xbox games running at lower res than PS games is very likely to continue this generation...
On the plus side - 1600*900 has 70% the number of pixels as 1080p, which means that the problem of only having 70% of the compute power is almost completely solved Genius!

So if using deferred renderer where your biggest memory bottleneck is repeatedly reading the GBuffer, you might decide to move it into ESRAM for the majority of the frame, and there's 90% of your "fast memory" budget gone immediately leaving you with ~3MiB of super fast RAM to solve all your other problems...

Also, you can't get total system bandwidth by adding DDR3 bandwidth and ESRAM bandwidth -- if you're using the ESRAM buffer to help accelerate different jobs, you'll be periodically moving data into and out of ESRAM. When doing that, the bandwidth of the move is min(ddrBandwidth, esramBandwidth)!
e.g. if you're additively blending lots of lights into an ESRAM accumulation target so that you get super-fast alpha blending, and then later have to evict it into DDR3 so that you can use ESRAM for other purposes, then that eviction/copy task has the bandwidth of DDR3, not the bandwidth of ESRAM.

If you're continually switching between render-targets, but want all your render-targtes to be ESRAM accelerated, then you'll waste so much time on DDR->ESRAM and ESRAM->DDR copy jobs, that you'll likely negate almost all of the performance gains of ESRAM, leaving you with something like a 3% frametime improvement
The alternative is using to accelerate less resources, so that there's less wasteful copying... but that means that you're stuck suffering with shitty DDR3 more of the time too. If you use it for too many purposes, you're screwed, and if you don't use it for too many purposes, you're screwed

Its not so much the shader throughput that's holding Xbox One back, as much as it is poor utilization of the ESRAM.

If we're going to say that it's going to be so much faster once dev's figure out how to use that super fast 32MiB buffer, we can also make the same predictions about games getting faster once dev's figure out how to port more work onto async compute pipelines.
The new GPUs can read multiple command buffers at once, and "hyperthread" workloads to make use of otherwise-would-be-idle hardware.
e.g. shadow-map rendering is extremely rasterizer and ROP-bound, leaving the compute units mostly idle. There's already games shipping now who have ported their post-processing jobs to async compute, allowing them to overlap with the next frame's shadow-map rendering jobs, for a ~15% reduction in frame times (enough for a 50fps game to push itself up to 60Hz).

For these kinds of optimizations, the PS4 will benefit 50% more (or 41% more naively accounting for clock speed) than the Xbone will.

That's also the kind of GPU-side optimization that Mantle makes possible on PC.

. 22 Racing Series .

SmkViper

5,396

February 26, 2015 02:15 PM

(enough for a 50fps game to push itself up to 60Hz).

Now if only ANY console game actually ran anywhere near 50 or 60fps... devs seem perfectly happy to push more shiny at the screen and run at 30fps on consoles instead.

After all, gameplay and responsiveness don't market as well as screenshots and 30fps youtube videos ;)

Krohm

5,051

February 26, 2015 03:22 PM

There has been a lot of press releases and discussions about both computer and API performance jump in the next generation. Do you believe it and why?

Depends on which rumors.

The 100x more drawcallz meanz 100x more perf... no, not at all. Synthetic benchmarks do not always translate in real world usage.

The "twice as fast in the real world" yes, I believe that easily.

Previously "Krohm"

Hodgman

52,717

February 27, 2015 02:20 AM

Now if only ANY console game actually ran anywhere near 50 or 60fps... devs seem perfectly happy to push more shiny at the screen and run at 30fps on consoles instead.

After all, gameplay and responsiveness don't market as well as screenshots and 30fps youtube videos ;)

Sports and racing games often run at 60Hz, as the smoothness of motion is important to the experience...

But when faced with the option of having double the smoothness, or more than double* the amount and quality of "stuff" on screen, it's very easy to choose the latter. Especially on prev-gen -- a 16.6ms frametime budget on a machine designed in 2005 is a rediculous target to meet!!

There does seem to be quite a few current-gen games choosing 60Hz though - http://au.ign.com/wikis/xbox-one/PS4_vs._Xbox_One_Native_Resolutions_and_Framerates

I'm currently working on a 60Hz game for current-gen, and even that is a hell of a lot of work compared to a 30Hz game We could ship months sooner if the publisher was happy with 30Hz.

As for screenshots though -- everyone cheats for marketing. At one past job, our engine had a 16x screenshot mode, which output 20480 x 11520 images on request. You'd then downscale them to 1080p or 720p or whatever the media expects to receive (or leave them super high res for magazine prints). At a different past job, we just always output screenshots at 4K res...

These days I'm very cynical about the phrases "in engine", "in game", "captured directly from [console name]", etc... as none of those phrases state that they didn't artificially boost the rendering quality and blow out their frametime budget just to get that screenshot from the game

and p.s. Youtube supports 60Hz now

* if your 60Hz game uses 10ms of simulation on the CPU and 6ms of rendering, then changing your budget to 30Hz suddenly gives you an extra 17ms of new time per frame, so you could in theory revise your budgets to 10ms of simulation and 23ms of rendering -- an almost 4x increase in the CPU-side graphics budget, not a simple 2x increase.

. 22 Racing Series .

Ravyne

14,306

February 27, 2015 02:50 AM

It's 12 CU's vs 18 CU's, and I've never heard of 4/18 of the PS4's being reserved for compute shader usage only (???).
Even naively accounting for the clock speed difference, PS4 has a GPU compute power advantage of 41% That is a big enough gap to matter, a lot.

As far as I had heard, not first-hand mind you, 4 out of the 18 used clusters (of 20 on chip for yield purposes) were reserved for GPGPU. Early on there were speculation that perhaps these other 4 were either a newer generation of GCN, or had their own dedicated ACE, separate from the one driving the other 14 for rendering. I forget where exactly I had read about the 14/4 split, but it would have been a reputable source (like an Ars breakdown), or leaked technical docs -- not just some rumor site.

Anyways, thanks for the detailed post, nice to have some more insight. I'm aware that the ESRAM really should have been around 40MB like you say (and should have been ~18MB in Xbox 360) -- Somehow, Microsoft has twice now made the same mistake of under-sizing it for 1080p and 720p, respectively. I think in Xbox One it was a big mistake to favor it over just using GDDR5 and calling it a day -- there's no redundancy in the ESRAM, so any lithography error or sub-par performance in any part of the ESRAM means they have to trash that entire chip -- which is a big deal, considering that the ESRAM occupies about 1/3rd of the chip. Sony filled that third with more GPU clusters, and can tolerate up to two bad clusters over the entire area.

The bit about asynchronistic use of the Clusters is something I hadn't considered before. Perhaps making that easier or more effective is where Xbox might pick up some ground, as it could better utilize some of those over-abundant ALU cycles, while waiting around for the memory.

I think there will be reasons yet to come to light why they made the decision to go with ESRAM + DDR3, and those reasons won't be "Make the Xbox One a gaming beast", which was Sony's mantra this generation.

throw table_exception("(? ???)? ? ???");

Hodgman

52,717

February 27, 2015 04:27 AM

As far as I had heard, not first-hand mind you, 4 out of the 18 used clusters (of 20 on chip for yield purposes) were reserved for GPGPU. Early on there were speculation that perhaps these other 4 were either a newer generation of GCN, or had their own dedicated ACE, separate from the one driving the other 14 for rendering. I forget where exactly I had read about the 14/4 split, but it would have been a reputable source (like an Ars breakdown), or leaked technical docs -- not just some rumor site.

I've got the actual spec documents first hand, but I'm careful to only talk about details I can cite in public documents, such as wikipedia

You might be thinking of the 8 Async Compute Engines (ACEs) on the PS4 GPU. These are front-end command-buffer readers, each of which can be reading commands from 8 buffers (meaning the CPU can be submitting a total of 64 async compute contexts at a time). These ACEs are not hard-wired to their own compute units -- all the CU's are shared by all the front-ends. An arbitrator/scheduler chooses when to pop work for each of the >64 queues (async compute + graphics queues) and which CU's to assign the work to.
i.e. having more of these ACEs and having them be dedicated to GPGPU/compute workloads (no graphics commands) does not mean that any CU's are dedicated to only doing compute and not graphics work.

The Xbone also has some of these ACEs, for dedicated compute-only command buffers... but again, it has less of them than the PS4 does, reducing the possibilities for CPU-side parallelism of command generation, and GPU-side parallelism of "hyperthreaded" execution

Somehow, Microsoft has twice now made the same mistake of under-sizing it for 1080p and 720p, respectively.

The reason the XB1 has less CU's than the PS4 is because of the existence of the ESRAM. As you said, it's something like a 3rd of the area of the die -- so to free up that huge amount of space, they needed to remove some other hardware from somewhere... and they chose to throw out 1/3rd of their GPU!
To fit on a more useful amount of ESRAM, they would've had to throw out the rest of their GPU as well

The bit about asynchronistic use of the Clusters is something I hadn't considered before. Perhaps making that easier or more effective is where Xbox might pick up some ground, as it could better utilize some of those over-abundant ALU cycles, while waiting around for the memory.

Yeah but again, the PS4 has 50% more ALU units to make use of... or once you account for the Xbone's clock advantage, the PS4 has 41% more ALU clocks to make use of, plus the PS4 has way more async queues to draw jobs from, so when games start embracing this optimization opportunity given to them by newer APIs, the XB1 won't pick up any ground -- the PS4 will actually pull further ahead!

No matter which way you cut it, the Xbone is just super far behind HW wise... with the only advantage being that at any point in time, you can get a massive bandwidth boost to 32MB of your dataset... plus it has kinnect (which BTW, if enabled, reserves 10% of all GPU cycles for it's own use -- Diablo 3 was only able to increase it's resolution from 900p to 1080p after MS released a patch allowing devs to disable the Kinnect).

I forgot to mention this earlier:

I think Sony came out with the better architecture this round, they actually have slightly-lower total memory bandwidth

Xbone's DDR3 peak is 68GB/s and ESRAM is 102GB/s, for a naively added total of 170GB/s.
PS4's GDDR5 peak is 176GB/s.
So naively looking at theoretical total bandwidth, PS4 still comes out on top
In practice, you'll often be accessing RAM via some specific bus, which probably has a capacity of more like 20GB/s making everything look a lot more even though (except ESRAM, which you'd assume must have a super high-speed embedded bus to make use of it's high speeds)

. 22 Racing Series .

SmkViper

5,396

February 27, 2015 02:48 PM

Now if only ANY console game actually ran anywhere near 50 or 60fps... devs seem perfectly happy to push more shiny at the screen and run at 30fps on consoles instead.

After all, gameplay and responsiveness don't market as well as screenshots and 30fps youtube videos ;)
Sports and racing games often run at 60Hz, as the smoothness of motion is important to the experience...
But when faced with the option of having double the smoothness, or more than double* the amount and quality of "stuff" on screen, it's very easy to choose the latter. Especially on prev-gen -- a 16.6ms frametime budget on a machine designed in 2005 is a rediculous target to meet!!
There does seem to be quite a few current-gen games choosing 60Hz though - http://au.ign.com/wikis/xbox-one/PS4_vs._Xbox_One_Native_Resolutions_and_Framerates

I'm currently working on a 60Hz game for current-gen, and even that is a hell of a lot of work compared to a 30Hz game We could ship months sooner if the publisher was happy with 30Hz.

As for screenshots though -- everyone cheats for marketing. At one past job, our engine had a 16x screenshot mode, which output 20480 x 11520 images on request. You'd then downscale them to 1080p or 720p or whatever the media expects to receive (or leave them super high res for magazine prints). At a different past job, we just always output screenshots at 4K res...
These days I'm very cynical about the phrases "in engine", "in game", "captured directly from [console name]", etc... as none of those phrases state that they didn't artificially boost the rendering quality and blow out their frametime budget just to get that screenshot from the game
and p.s. Youtube supports 60Hz now

* if your 60Hz game uses 10ms of simulation on the CPU and 6ms of rendering, then changing your budget to 30Hz suddenly gives you an extra 17ms of new time per frame, so you could in theory revise your budgets to 10ms of simulation and 23ms of rendering -- an almost 4x increase in the CPU-side graphics budget, not a simple 2x increase.

Thanks for the list - that's more then I thought. I did skim the list (there's some weird sizing bug going on so I can't read half the names or stats) and what stood out to me is that the 60fps games were almost all previous-gen games, or games that had previous-gen ports. It's much easier to hit 60 if you must run on old hardware as well, so your assets and such are already down-rezzed. Ironically, Forza 5 was 60fps on XB1, but its sequel - Horizon 2 - is only 30.

(I wish I could speak to my own experience with current gen and framerates but I'm going to avoid that due to NDA...)

And yes, youtube has 60fps vid now, but if you give a good motion blur pass on a 30fps video no one will notice. It's much harder to see 30 vs 60 when motion blur is applied - but you can certainly *feel* it when you play.

3Ddreamer

3,826

Author

March 26, 2015 06:28 PM

Here is an interesting article on DirectX 12:

http://www.forbes.com/sites/jasonevangelho/2015/03/26/directx-12-delivers-amd-nvidia-and-intel-hardware-tested-with-awesome-improvements/?utm_campaign=yahootix&partner=yahootix

Personal life and your private thoughts always effect your career. Research is the intellectual backbone of game development and the first order. Version Control is crucial for full management of applications and software. The better the workflow pipeline, then the greater the potential output for a quality game. Completing projects is the last but finest order.

by Clinton, 3Ddreamer

Believe reports about Next Gen Computer and API Performance?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Believe reports about Next Gen Computer and API Performance?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines