Jump to content
  • Advertisement
Sign in to follow this  
Anthony Prizmich

Frame buffer speed, when does it matter?

This topic is 727 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'd like to ask how important is the speed of video memory and when is it used? When creating resources, we copy from RAM to VRAM and that process is severely limited with the PCI-E bus. But when does memory speed come to spotlight? When does HBM shine with it's insane bandwidth/interface? 

 

Thanks.

Share this post


Link to post
Share on other sites
Advertisement
Rendering to a texture/buffer uses up bandwidth. Reading from a texture/buffer uses bandwidth. Rendering/reading more data (e.g. rendering in higher resolution) uses more bandwidth. Rendering over existing texel data (aka overdraw) wastes excess bandwidth.

More memory bandwidth/speed means that you can render more things per second.

Memory bandwidth particularly is helpful when rendering to higher resolutions. Rendering to a 4K screen - or to the dual screens needed for VR - uses significantly more bandwidth than typical for 1080p gaming, simply on account of there being far more pixels (not just being drawn, but also being read/rewritten in all the post-processing stages).

Share this post


Link to post
Share on other sites

It is always a spotlight and always used.....it's a computing system, it is always fetching memory. Not sure what you meant by frame buffer. FBO? If you want to see the performance impact of GPU Ram speed, then download a perfomance/OC tool and turn down the memory speed about 500Mhz while in a game and see what happens to the framerate.

If you are talking system RAM, then no that speed doesn't matter too much with interfacing to the GPU, at least if you look at low speed DDR3 vs high speed DDR3, the benchmarks I've seen might get 1 FPS cump from 59 to 60.

Share this post


Link to post
Share on other sites

In addition to what others have said I'd like to add blending to the list since it at minimum it is a read-modify-write which basically doubles bandwidth requirements.

Share this post


Link to post
Share on other sites

Thanks everyone. I'm into learning graphics from the very start and I'm having  an extremely hard time actually finding basic resources to learn just exactly how it all works, all the actual details of the pipeline, graphics workflow. I'm willing to buy any book, but can't find anything that would explain it on the level of graphics cores, caches, memory, memory controllers - what actually happens with byte arrays containg textures, how and what schedulers do, core architecture thing.

 

Why am I asking this is (higher level view) because I'm interested in why HBM is beneficial and when does it stop being such. I presumed that interface width of the memory enables us to transfer more data in a shorter time, meaning that probably many post-processing effects that operate on fully composed images, get faster with more bandwidth since every read of the memory into registers is faster, and every write also.

 

How much would a Fury line loose if used GDDR5 instead of HBM?

Edited by AnthPrime

Share this post


Link to post
Share on other sites

I presumed that interface width of the memory enables us to transfer more data in a shorter time

It enables to transfer more data in the same time, not in shorter time. It's a very important distinction.
Think of the problem as a truck travelling 500km and it takes them 5 hours to complete. The truck can only hold 1tn of cargo. If you use two trucks, you can send twice the amount of cargo. But it still will take 5 hours to complete.
 

Why am I asking this is (higher level view) because I'm interested in why HBM is beneficial and when does it stop being such.

It depends on something we call "bottleneck". A game that performs a lot of reads and writes may be bandwidth limited, thus memory that has higher bandwidth will run faster.
But if another game executes a lot of math (which uses the ALU units Hodgman describes) and that's most of what it does, then higher bandwidth won't do jack squad because that's not the bottleneck.
 
Going back to the truck example:

You have to transfer 2tn of cargo. You have one truck. This is your bottleneck. You need 5hs to travel 500km and send 1tn, then another 5hs to get back and load the rest. Then 5hs more to travel 500km again. In total all the travelling took 15hs by using one truck.
If you use two trucks, you'll be done in 5hs. Memory bandwidth and bus bandwidth behave more or less the same. Because you can send more data in the same amount of time, but you needed a lot of data to send; doubling the amount of data you can transfer allows you to finish sooner only if it's the bottleneck. But you can never go less than 5hs in one trip. (Why? you ask? because GPUs can't send data faster than the speed of light)
 
Now let's add the "ALU" to the example: Let's suppose all you have to send in the truck a machine that weights only 70kg (that's 0.07tn). However disassembling the machine for transportation and load it into the truck takes you 8 hours. The truck then begins its journey and takes 5hs. Total time = 13hs.
You could use two trucks... but it will still take you 13hs because having an extra truck doesn't help you at all in disassembling the machine. What you need is an extra hand, not another truck. The bottleneck here is in disassembling the machine, not in transportation.
 
In this example people = ALU; trucks = bandwidth.
More people = you can disassemble and load the machine into the truck faster.
More trucks = you can send more cargo per trip.
 
More ALU = you can do more math operation in the same amount of time.
More bandwidth = you can do more loads and store from/to memory in the same amount of time.
 
So, to answer your question: does an increase of bandwidth make a game run faster? It depends.

Share this post


Link to post
Share on other sites

Thanks everyone. I'm into learning graphics from the very start and I'm having  an extremely hard time actually finding basic resources to learn just exactly how it all works, all the actual details of the pipeline, graphics workflow. I'm willing to buy any book, but can't find anything that would explain it on the level of graphics cores, caches, memory, memory controllers - what actually happens with byte arrays containg textures, how and what schedulers do, core architecture thing.

https://developer.nvidia.com/content/life-triangle-nvidias-logical-pipeline

http://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/

http://s09.idav.ucdavis.edu/talks/02_kayvonf_gpuArchTalk09.pdf

http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf

Share this post


Link to post
Share on other sites

I'm into learning graphics from the very start and I'm having  an extremely hard time actually finding basic resources to learn just exactly how it all works, all the actual details of the pipeline, graphics workflow.

This is a good overview of the nitty gritty details that you don't really need to know :D
https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
 

Why am I asking this is (higher level view) because I'm interested in why HBM is beneficial and when does it stop being such. I presumed that interface width of the memory enables us to transfer more data in a shorter time, meaning that probably many post-processing effects that operate on fully composed images, get faster with more bandwidth since every read of the memory into registers is faster, and every write also.

Most of the time you can ignore the actual name of the RAM - DDR3/GDDR5/HBM2/etc... - and just look at the performance stats.
e.g. The Wii U has 2GB of DDR3 with a maximum bandwidth of 12.8 GB/s. I don't really care that it's DDR3, but that 12.8 GB/s number is important. If you're running at 60 frames per second, that's 218.5 MB/frame...
So even though there's 2GB of RAM available, the bandwidth number tells us that we can only access under 10% of it in any one frame. And that's the theoretical max bandwidth, not the actual performance that you game will see. The max bandwidth can only be achieved if you write a program that does nothing but transfer data around. Real programs tend to alternate between doing some processing and doing some data transfers, and they have bottlenecks and stalls, etc, causing real performance to always fall short of theory.
Meanwhile, the Xbox360 has 512MB of GDDR with a maximum bandwidth of 22.4 GB/s -- or at 60Hz, that's 382.3 MB/frame -- so even though the WiiU has 4x more memory, the amount that it's actually able to touch in any one frame is less!
 
For most purposes, just looking at that max bandwidth figure is enough to give you a ballpark performance metric of how much data you should be able to access. If you then work out how much data you need to access in order to implement your algorithms, you'll be able to tell if it's theoretically possible or not.
 
It's only when doing very low level optimizations that the other properties of RAM will be of importance to you. Two of them could be:
* the latency of a memory fetch -- including in a "cache miss" situation and a "cache hit" situation. The latency between a processor and RAM tends to be extremely high these days, as much as 1000 cycles, so there's usually a small cache bolted onto the processor to speed things up. If a value is already present in the cache, it might only take 10-100 cycles to fetch the data. Caches often have multiple levels of different sizes -- smaller caches closer to the processor, and larger caches further away and sometimes shared between a few processing cores.
One of the biggest areas of low-level code optimization these days is deliberately organizing your data (and the order of your processing operations) in such a way to maximize cache hits and avoid cache misses -- also known as Data oriented Design by some people.
* when doing even lower level optimization, the actual physical structure of the RAM (or cache) can be important. Some memory systems have different memory buses for different physical parts of the RAM. e.g. Texture A is fetched over bus#1 and Texture B is fetched over bus#2. These buses may operate asynchronously and in parallel, so doing 10x fetches from Texture A might take twice as long as doing 4x fetches from Texture A plus 5x fetches from Texture B, even though the same total amount of data has been moved. Variations of this issue are "bank conflicts", "conflict miss" (for associative caches), or "false sharing" (for SMT cache lines).
Check out http://www.futurechips.org/chip-design-for-all/what-every-programmer-should-know-about-the-memory-system.html
 

How much would a Fury line loose if used GDDR5 instead of HBM?

It depends on the shaders that you're running on it :D
If the shaders are completely ALU bottlenecked, e.g. procedurally generating all the graphics with no need for memory, then there would probably be no performance difference!

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!