What's a cache miss?

Started by
8 comments, last by Promit 20 years, 6 months ago
I just read this in the nVidia engien design thing:
quote: Consider every time you follow a pointer, you can assume a CPU cache miss The deeper your tree, the more cache misses you will take Cache misses can be more expensive than intersection tests Therefore, shallower tree types will perform better on high-polygon-count scenes
I have no clue what they're talking about. [edited by - Promit on October 10, 2003 4:30:37 PM]
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Advertisement
The CPU cache is basically a special, high-speed memory area directly on the CPU die. It is a lot faster to access than RAM, so the CPU tries to predict what code your program is going to use next, and loads that into the cache. If it misguesses, this is called a cache miss, and the CPU has to access system memory to get the next block of code. The same thing happens for data; some data is kept in the cache, but if the CPU misguesses which data your code needs, it has to access system memory. This is Very Slow and, in general, a Bad Thing.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

So what can we do to exploit this feature?
All I got from the paper is that you should play with the order of variable declarations and try to use smaller datatypes when possible.

Another question, what is "aliasing" ?
The first doc I came to think of that explains some of this:
http://en.tldp.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/memory-management.html
quote:Original post by Anonymous Poster
So what can we do to exploit this feature?
All I got from the paper is that you should play with the order of variable declarations and try to use smaller datatypes when possible.


On modern CPUs, there are several factors you have to juggle (I''m by no means discussing them all in this post). It is not always the best choice to choose the smallest data type, because sometimes smaller data types aren''t the fastest data type. If your program uses, say, 1MB of data often, and you could use bytes instead of 32-bit values to represent that data, you could cut that data down to 256KB, which would be much more cache friendly. That''s just one example. You could even try packing it to sub-byte sizes, because the unpacking from the L1 cache will be way faster than reading the data straight from memory.

Other things you have to worry about are branch mispredictions. Basically, when your program comes to a conditional (if, for, while, etc.) the CPU takes an educated guess at whether the test will be true or false, and then goes on processing as if it''s assumption is correct. If the assumption turns out to be false, the CPU has to back track a little and redo some work. So sometimes you can get extra speed from turning simple if statements into, say, a table lookup. Just make sure your table is in the cache Usually, if your branches are predictable, then the CPU can predict them accurately. For instance, I have a chess program, and usually there will be a pattern of "white to move, black to move, white to move, black to move, ...", and if I have something like if (white_to_move) { ... } else { ... }; then the compiler shouldn''t have any problem predicting that one.

There are lots of things to consider, and you can end up going around in circles for years trying to figure out what the fastest way is. My advice, like most people''s, is to 1) try to learn efficient ways of doing things, what algorithms are the most efficient, and so on, and 2) don''t worry about efficiency beyond that point, until you can run it through a profiler. Most people just say "use a profiler", but that doesn''t mean you should ignore efficiency all together before you use a profiler. Just don''t make it a primary goal. Usually learning how to code efficiently before you use a profiler comes from experience, reading well written code, and writing code yourself.

quote:Another question, what is "aliasing"?


Aliasing is when the same variable has two names. For instance, two pointers could point to the same variable. If the compiler can assume that no aliasing occurs (something you usually have to tell it), then it can do more optimizations.
quote:Original post by Russell
then the compiler shouldn''t have any problem predicting that one.
Just nitpicking here, but the compiler doesn''t do prediction, that''s the processor (unless I missunderstood your reasoning).
Right, the CPU does the branch prediction. Just typed the wrong word
quote:
Aliasing is when the same variable has two names. For instance, two pointers could point to the same variable. If the compiler can assume that no aliasing occurs (something you usually have to tell it), then it can do more optimizations.

Depending how it''s used it could also mean alligned to a grid (not really programming related but you never know)

Aliasing is a very general term application to dozens, if not hundreds, of different scenarios. The use here is a shorten form of ''variable aliasing'', or maybe ''memory aliasing'' (which is essentially the same thing).


Accessing main memory takes time. It is accessed through a bus, so first you must stroke the bus with the address of the memory you want to read. Then the other side has to pick up this address, decode it''s chips to find the correct dword, and put the contents of that dword on the bus for the CPU to read next. To make this process more efficent, they read cache lines at a time (128k now!) and keep it in the CPU (no bus stroking to access next time). The thought is, if you access this dword, odds are good you''ll access the next dword too. The memory transfer can be done in a burst then, reducing overhead.
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
quote:Original post by Magmai Kai Holmlor
To make this process more efficent, they read cache lines at a time (128k now!)
Did you mean 128k or 128 bytes?

This topic is closed to new replies.

Advertisement