Who needs TurboCache and HyperMemory?

Started by
20 comments, last by Promit 18 years, 6 months ago
Quote:Original post by Anonymous Poster
The best analogy for that would be sticking a horse inside a car, it won't make it go faster, in fact, it may make it go slower.


No, I wouldn't say that. I guess NVidia or ATI don't do that on a 6600 or similar(maybe they have already) just because they think onboard mem is enough. TC or HM is just upgraded version of AGP.


I would point out that this thread didn't mean to be a debate on those TC cards. Surely they are far behind enough local mem equipped cards.

What I want to say is TC and HM are technically rubbish even to integrated cards. As PCIE have boosted speed and will do more, AGP becomes useless. Thus TC and HM are wrong efforts on the wrong way. These specialized techniques cause double system mem waste, the more they are used, the more waste. That is, when virtual video memory is used. And we will always need VVM for many reasons.
(Don't get virtual video memory mixed with AGP ones. The latter could be thought as extended video momory.)


Advertisement
Actually it doesn't waste any memory. Even with a 512 MB graphics card either the API or the driver keeps a copy of every resource in system memory. With TurboCache/HyperMemory you just access them directly. So its a very effective way of saving memory costs. With some fast (though small) VRAM close to the graphics chip it works like a cache to hide access latencies (hence NVIDIA's naming). It can be faster than having a lot, but slower (cheap) VRAM.

These cards are a good deal for those wishing to spend the minimum!
Quote:Original post by C0D1F1ED
Actually it doesn't waste any memory. Even with a 512 MB graphics card either the API or the driver keeps a copy of every resource in system memory.


And I'm Mother Teresa.

Quote:Original post by Serenade
How much do TurboCache and HyperMemory boost GPU's access to system memory?
I mean, since the speed boost is mainly caused by PCIE, why do we need TC and HM?

TC and HM are just technologies to make this as transparent as possible. By prefetching and caching and pushing unused data to system memory. Yes, it just relies on PCI-E, but they're really attemts at overcoming the shortcomings you mention. Without HM/TC, the 3D API would know nowthing about it, the API would report the smaller amount of video ram, and everything would be terribly inefficient. With HM/TC, the driver is aware of what's happening, and just tries to manage it as effiently as possible. It's the simple realization that without it, you have DX/OGL reporting ridiculously small amounts of vid ram to games (which would then refuse to run, or act oddly), so instead, the driver grabs a chunk of system memory, and tries its best to make things work.

Quote:Improving bus transfer speed is the right way to go.

Nope. That will never solve the problem. Bus transfer speeds are not the main problem. Latencies are.
Transferring data from (fast) vram to the neighboring GPU is *much* faster than transferring from (slow) system ram, over the PCIE bus, to the GPU. Not so much because of the bandwidth, but because of the latency.
Besides, on low-end cards, "making a faster bus" is not exactly the solution you want. low end kinda implies you want to make do with *cheap* hardware.

TC/HM are attempts at hiding the latency. Of course, they're far from perfect, but they work a lot better than a similar card without them would.

Think of it as a crude virtual memory for graphics cards.

Windows swaps files out to the HDD to save ram. If it didn't, you'd be screwed once you tried to keep more in memory than you had RAM for. Or you'd have to manually swap stuff out you didn't need. That's horribly complicated and inefficient. So it's the job of Windows to try to anticipate what will be needed, and keep that in system memory, while swapping out less used stuff.

HM/TC does *exactly* the same. Instead of just allowing the card to naively use system memory over the PCIE bus, it tries to ensure only rarely used data is "swapped out", thus improving performance.

So, are you going to tell us that virtual memory is a bad idea? I'm sure that would save Microsoft a lot of work if they'd known that.
Quote:Original post by Spoonbender
So, are you going to tell us that virtual memory is a bad idea?


No, I didn't mean that. I mean two things:
1) TC and HM may not be virtual memory but extended memory.
Just think about this: A TC card with 32M local memory has to deal with a 128M texture. And Unfortunately, this texture is being sampled randomly. Even if virtual memory technique is used, it's no better than using system memory directly. The latter is what I called extended memory.

2) Even if virtual memory technique proofs to be better in that case. They are used internally by driver. It's wrong. It's D3D(or Ogl) who should implement virtual video memory. Thus D3D could make things efficient and with no waste.
Think about this: You've got to create a 1G texture and will modify them on the fly. VVM could make things work( as long as physical memory is enough and whole texture is not accessed at a time). But with VVM by drivers(TC and HM), you just couldn't. Two reasons:
1. They are not able to allocate that much system memory.
2. D3D locking operation doesn't know which part is in real video memory and which part is in system memory. To make locking more efficient, either D3D or engine programmers have to build VVM again. That's where waste comes.

That's why TC and HM are useless. At least, when you talk about VVM.
Quote:Original post by SerenadeThat's why TC and HM are useless. At least, when you talk about VVM.


This topic is asking the wrong question. "Who needs Turbocache and hypermemory ?" Come on.. People who have it will use it since it's there.

From your point of view (of a developer I guess), you should consider the Turbocache as a regular video card with the advertised amount of memory. The mechanism is totally transparent to the user and the programmer. It's a bit slow but simply because it's at the low end of the spectrum. It's still faster than a 5200 with 256MB of memory on board. If you plan to do a game that is scalable on PC you'll probably have to handle plenty of boards that are way slower and crappier featurewise (Intel graphics).
Thre are few thing that I think need to be clarified:

First, current video cards have a notion of "thread" that is used by the pixel pipelines? Each pipeline, or quad pipeline has a state (register values and program counter), that comprises a thread. The rasterizer spawns threads as it rasterizes primitives, early z can be performed before adding the thread to a sort of todo buffer. Then the pipelines switch to the threads in the to do buffer, and work until they finish or are blocked on a fetch latency, at which point they switch to another thread. By using that, the latency problem is significantly reduced, even using local vid mem.

Secondly, it would be difficult for the D3D runtime to manage virtual memory, since VM needs some kind of hw support to do the address translation. There would a need for a driver vm API.

Third, the driver can know what memory has been modified by a lock/write process by first looking at what regions have been locked or invalidated. It can also virtual protect the pages that are in the locked region, and catch which ones are written to.

Fourth, for Mother Teresa, if you take a look at the
windows::VirtualQuery
result on the in proc adressable space, you will see that a significant amount of memory is virtual memory mapped as video memory is used. On ATI cards, this is a full 256MB chunk of contiguous memory (at least on R9800 and x800). On Nv cards, it is just pieces of memory that are mapped here and there. Even though the driver is supposed to operate in a different address space, I'm sure this amounts to some kind of file mapping on the swap file to share address space between the driver and the app. This memory is only released when resources are freed from video memory. It took me some time to perform the experiments to check that, but I suggest that you try it too.

Finally, TC, or any kind of RAM to VID Mem virtual mapping is the way to go, but it is only used today on low end harware with almost just enough vid mem for a frame buffer and a few textures. Is can not really easily be used on high end cards (like 512MB VRAM) because the inproc addressable space is only of 2 GB, and that we are reaching a limit. 64 bit systems will help a lot in that area. It is also true (even though I've made myself the devil's advocate a few lines earlier) that API changes could ensure the streamining of a VMM system. It would be better for example to have tile based texture formats instead of plain scanline based ones, and other things of the like.
oops, that was me
Act of War - EugenSystems/Atari
This thread has been quite long, but still it seems my points not accepted.
So I have to argue further.

First, we need extended vid mem.
That is, when more than onboard mem is required at a time, we must use system mem directly. These sys mem should be deemed as part of REAL vid mem. And this is what I believe TC and HM are doing.

Second, we need virtual vid mem.
Not only we use VVM for purpose of extending real vid mem, but also we use it to avoid blocking. That is, VVM is REAL vid mem + System mem backup. REAL vid mem is a subset of system backup. So lock/modify is applied to system backup first to avoid blocking. Only uploading will interface with REAL vid mem.
D3D is suitable to do this.

I reason I dislike TC and HM is because I think even the first part could be accomplished by D3D.


Quote:
Secondly, it would be difficult for the D3D runtime to manage virtual memory, since VM needs some kind of hw support to do the address translation. There would a need for a driver vm API.

With what I stated above, I'd think it's not that hard for D3D to determine which part of memory is created as REAL vid mem without driver vm API.




Quote:Original post by Serenade
First, we need extended vid mem.
That is, when more than onboard mem is required at a time, we must use system mem directly. These sys mem should be deemed as part of REAL vid mem. And this is what I believe TC and HM are doing.


APIs have been perfectly capable of that for some time now. What TC and HM are is that they are methods of leveraging the PCIe bus to get exceedingly fast (relative to AGP) transfers between system memroy and GPU (presumably through some sort of DMA mechanism) while cutting costs by shipping video cards with only enough graphics memory to hold their framebuffer.

[Edited by - Promit on September 29, 2005 7:41:18 AM]
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

This topic is closed to new replies.

Advertisement