Archived

This topic is now archived and is closed to further replies.

What's a cache miss?

This topic is 5371 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I just read this in the nVidia engien design thing:
quote:
Consider every time you follow a pointer, you can assume a CPU cache miss The deeper your tree, the more cache misses you will take Cache misses can be more expensive than intersection tests Therefore, shallower tree types will perform better on high-polygon-count scenes
I have no clue what they're talking about. [edited by - Promit on October 10, 2003 4:30:37 PM]

Share on other sites
The CPU cache is basically a special, high-speed memory area directly on the CPU die. It is a lot faster to access than RAM, so the CPU tries to predict what code your program is going to use next, and loads that into the cache. If it misguesses, this is called a cache miss, and the CPU has to access system memory to get the next block of code. The same thing happens for data; some data is kept in the cache, but if the CPU misguesses which data your code needs, it has to access system memory. This is Very Slow and, in general, a Bad Thing.

Share on other sites
So what can we do to exploit this feature?
All I got from the paper is that you should play with the order of variable declarations and try to use smaller datatypes when possible.

Another question, what is "aliasing" ?

Share on other sites
quote:
Original post by Anonymous Poster
So what can we do to exploit this feature?
All I got from the paper is that you should play with the order of variable declarations and try to use smaller datatypes when possible.

On modern CPUs, there are several factors you have to juggle (I''m by no means discussing them all in this post). It is not always the best choice to choose the smallest data type, because sometimes smaller data types aren''t the fastest data type. If your program uses, say, 1MB of data often, and you could use bytes instead of 32-bit values to represent that data, you could cut that data down to 256KB, which would be much more cache friendly. That''s just one example. You could even try packing it to sub-byte sizes, because the unpacking from the L1 cache will be way faster than reading the data straight from memory.

Other things you have to worry about are branch mispredictions. Basically, when your program comes to a conditional (if, for, while, etc.) the CPU takes an educated guess at whether the test will be true or false, and then goes on processing as if it''s assumption is correct. If the assumption turns out to be false, the CPU has to back track a little and redo some work. So sometimes you can get extra speed from turning simple if statements into, say, a table lookup. Just make sure your table is in the cache Usually, if your branches are predictable, then the CPU can predict them accurately. For instance, I have a chess program, and usually there will be a pattern of "white to move, black to move, white to move, black to move, ...", and if I have something like if (white_to_move) { ... } else { ... }; then the compiler shouldn''t have any problem predicting that one.

There are lots of things to consider, and you can end up going around in circles for years trying to figure out what the fastest way is. My advice, like most people''s, is to 1) try to learn efficient ways of doing things, what algorithms are the most efficient, and so on, and 2) don''t worry about efficiency beyond that point, until you can run it through a profiler. Most people just say "use a profiler", but that doesn''t mean you should ignore efficiency all together before you use a profiler. Just don''t make it a primary goal. Usually learning how to code efficiently before you use a profiler comes from experience, reading well written code, and writing code yourself.

quote:
Another question, what is "aliasing"?

Aliasing is when the same variable has two names. For instance, two pointers could point to the same variable. If the compiler can assume that no aliasing occurs (something you usually have to tell it), then it can do more optimizations.

Share on other sites
quote:
Original post by Russell
then the compiler shouldn''t have any problem predicting that one.
Just nitpicking here, but the compiler doesn''t do prediction, that''s the processor (unless I missunderstood your reasoning).

Share on other sites
Right, the CPU does the branch prediction. Just typed the wrong word

Share on other sites
quote:

Aliasing is when the same variable has two names. For instance, two pointers could point to the same variable. If the compiler can assume that no aliasing occurs (something you usually have to tell it), then it can do more optimizations.

Depending how it''s used it could also mean alligned to a grid (not really programming related but you never know)

Share on other sites
Aliasing is a very general term application to dozens, if not hundreds, of different scenarios. The use here is a shorten form of ''variable aliasing'', or maybe ''memory aliasing'' (which is essentially the same thing).

Accessing main memory takes time. It is accessed through a bus, so first you must stroke the bus with the address of the memory you want to read. Then the other side has to pick up this address, decode it''s chips to find the correct dword, and put the contents of that dword on the bus for the CPU to read next. To make this process more efficent, they read cache lines at a time (128k now!) and keep it in the CPU (no bus stroking to access next time). The thought is, if you access this dword, odds are good you''ll access the next dword too. The memory transfer can be done in a burst then, reducing overhead.

Share on other sites
quote:
Original post by Magmai Kai Holmlor
To make this process more efficent, they read cache lines at a time (128k now!)
Did you mean 128k or 128 bytes?

1. 1
2. 2
3. 3
4. 4
Rutin
18
5. 5

• 14
• 12
• 9
• 12
• 37
• Forum Statistics

• Total Topics
631423
• Total Posts
3000006
×