Sign in to follow this  
Ed Welch

AMD's High Bandwidth Cache Controller

Recommended Posts

I was just looking at a video that explains some of the new features in AMD's Vega GPU (https://www.youtube.com/watch?v=V9yTlRlxSVc). I couldn't quite figure out how the "High Bandwidth Cache Controller" worked. I'm assuming that the game developer has to specifically mark the assets as controlled by HBCC for it to work. From then on the GPU and driver handles loading and allocation on demand. It seems like this is very similar concept to "mega-textures" except that it's controlled by the hardware.

 

Share this post


Link to post
Share on other sites
From the sounds of it, their HBM RAM isn't treated as "memory", but as a L3 cache in the cache hierarchy.
Normally when you perform a texture fetch, the GPU will read a small block of textures from RAM (either GPU RAM or system RAM), store that block in L2, and then move 1-4 texels from L2 to L1/the texture filtering unit.
It sounds like these GPUs will insert their HBM between L2 and RAM, and the driver will automatically populate this L3 with resources that are requested by L2.

e.g. L1 requests 4 pixels from L2, L2 grabs a 64 pixel block from L3, L3 grabs a whole mipmap from RAM.

I haven't looked in depth, but I saw they've done work on the address space management, so more guessing:
It would also be possible for them to store some mipmaps/slices of a texture in GPU RAM, and some in system RAM, without the cache controller or game being aware. The entire texture is allocated as a contiguous block of virtual address pages, but some of those pages could be backed ny system RAM and others by GPU RAM.
You could move a resource from system to GPU without a hitch by performing an async copy from one physical location to the other, and only updating the virtual page mapping when the copy is complete.

If someone tries to fetch from a resource that's currently being moved, they'll just get the system RAM copy, which is slower.... But if this magic L3 HBM is caching whole resource slices, then that would greatly mitigate the slowness.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this