Jump to content
  • Advertisement
Sign in to follow this  
Nicholas Kong

What does the dimensions of the cache size and numbers of the bus speed tell me about the computer?

This topic is 2147 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So I did research on how to use a program called CPUID CPU-Z to pull up this information about my computer.

 

From what I researched, bus speed dictates how fast information travel in your computer. sounds very similar to a highway. What is considered "fast" and universal among computers? Is it really as straightforward as saying higher the bus speed, the faster information travels? What makes it slow and fast or stablizes normally? Mines seems to flunctuate.

 

I am confused about the dimensions of the cache. From what I researched, a cache is a thing that stores stuff so when that stuff needs to be used again, it can be easily retrieved. So is a cache similar to an array?

 

Cache Sizes: L1 Data 2 x 32 kBytes 8 way
                     L1 Inst 2 x 32 kBytes 8 way
                    Level 2 4096 kBytes 16 way

Bus Speed: 199.511 MHz

 

I saw the diagram in http://en.wikipedia.org/wiki/Front-side_bus and realized out that they are connected together. But I do not understand what the meaning behind the 2 x 32 and the 8 or 16 way. What is a level inst cache?

Edited by warnexus

Share this post


Link to post
Share on other sites
Advertisement

Swap space = 10,000,000 ns


Fixed that for ya. Virtual memory has a very exact meaning, whose use of the disk is an implementation feature of particular operating systems, but not the sole component.

Share this post


Link to post
Share on other sites

This is a fairly involved discussion.  Here is the 10,000 foot view.

 

*snip*

 

Anyways, there's more to your question and I'll try to answer later when I have more time.  An awesome book to cut your teeth on:  Computer Architecture, A Quantitative Approach by Hennessy & Patterson.  It is considered a classic.

Thanks for the book recommendation.

Share this post


Link to post
Share on other sites

Cache is high speed memory.

 

You want all your data to be as close to the processor as possible.  Here are some approximate numbers:

 

*snip*

Interesting, I am understanding it a bit more as I continue reading this portion. Thanks frob!

Share this post


Link to post
Share on other sites

1 processor cycle = 0.3 nanoseconds.
L1 cache hit = 4 cycles (1.2 ns)
L2 cache hit = 10 cycles (3 ns)
L3 cache hit, line unshared = 40 cycles (12 ns)
L3 cache hit, line shared in another core = 65 cycles (20 ns)
L3 cache hit, line modified in another core = 75 cycles (22.5 ns)
Remote L3 cache hit = 100-300 cycles (30ns-90ns)
Local dram = 60ns
Remote dram = 100 ns
Swapped to disk = 10,000,000 ns

Just to add some data points to this table, and drive the point home:
An integer instruction has a throughput of 1 to 4 cycles. So an increment may take 1.0 ns (3 cycles).

A SIMD instruction will be the same, except being SIMD, the throughput can be as much as 8 times more per operand. So with a vectorizing compiler, and ideal conditions, an increment would take about 130 ps.

So as a consequence a single instruction like ++i may take anywhere fron 130ps to 10 ms.wacko.png

EDIT: Devide that by the number of real cores, if you wanna count MIMD. That means a 6 core desktop processor like i7-3930K can do 20ps throughput for increment instructions, best case. Edited by King Mir

Share this post


Link to post
Share on other sites

 

1 processor cycle = 0.3 nanoseconds.
L1 cache hit = 4 cycles (1.2 ns)
L2 cache hit = 10 cycles (3 ns)
L3 cache hit, line unshared = 40 cycles (12 ns)
L3 cache hit, line shared in another core = 65 cycles (20 ns)
L3 cache hit, line modified in another core = 75 cycles (22.5 ns)
Remote L3 cache hit = 100-300 cycles (30ns-90ns)
Local dram = 60ns
Remote dram = 100 ns
Swapped to disk = 10,000,000 ns

Just to add some data points to this table, and drive the point home:
An integer instruction has a throughput of 1 to 4 cycles. So an increment may take 1.0 ns (3 cycles).

A SIMD instruction will be the same, except being SIMD, the throughput can be as much as 8 times more per operand. So with a vectorizing compiler, and ideal conditions, an increment would take about 130 ps.

So as a consequence a single instruction like ++i may take anywhere fron 130ps to 10 ms.wacko.png

EDIT: Devide that by the number of real cores, if you wanna count MIMD. That means a 6 core desktop processor like i7-3930K can do 20ps throughput for increment instructions, best case.

 

It is hard to truly give a universal cost to instructions, except for perhaps, embedded processing systems.

 

The CPU pipeline amortizes the cost of a sequential flow of instructions assuming they have no hazards.  And even though it may take one instruction, in situ, several pipeline stages to execute, a sequential instruction will be executing at the same time as the initial instruction and come out one cycle later.

 

For example, if you have an 11 stage pipeline and you issue one instruction it will take 11 clock cycles.  However, if you have 11 instructions that have no hazards it will take only 21 CPU cycles to execute, rather than 11x11=121 cycles.

 

A further thing to muddy up the waters.  Modern CPUs have multiple dispatch, even for SISD   Multiple ALUs in several different pipelines can shorten that 21 CPU cycle cost even further.

Edited by Cosmic314

Share this post


Link to post
Share on other sites

When they talk about cache associativity they are referring to how the data is stored within the cache.  As a gross simplification, when you have a large number of cache bins you need to keep track of which bin contains which data. When you have 128KB of memory kept in 64-byte bins that is a lot of bins. On the one hand you want all your data in cache memory.  On the other hand you don't want to search every bin in order to find which bin actually holds your data. If you have a 16-way set associative cache it means you need to sort through 1/16th of the bins to find the data.

I can understand the desire to simplify things here, but I think you've oversimplified to the point where you've inverted what associativity means. A 16-way associative cache doesn't sort through 1/16th of the bins, it searches 16 bins. An 8-way associative cache would search through 8 bins.

To try again (still simplified): the idea behind a cache is that it stores the most recently used memory with the idea that if you've used it recently you're likely to want to use it again in the near future. One way to handle this is that when you access some memory, if it isn't already in the cache you find the bin that has gone the longest without being accessed and replace that bin with the new memory. Because any bin can be used for any memory address this is called a fully-associative cache - every memory can be associated with any bin. The down side here is that it takes effort to figure out which memory address has gone without access the longest and which bin currently has which memory address, which slows the cache down.

The opposite approach is called direct associative or one-way associative: every memory address can only ever be found in one of the bins. Now you no longer need to keep track of age or hunting down the bin for a given memory address. On the down side if you try accessing two memory addresses back and forth that map to the same bin, the cache might as well not exist because every access will generate a cache miss.

Between these two approaches are the n-way associative caches. Let's say you have a two way associative cache. Here every memory address has two associated bins. If you want to see which bin a given memory address is found in you only need to check two different bins. It's also a lot easier to keep track of which of those two bins has gone longest without being accessed. And since now two memory addresses that map to the same set of bins can be in the cache in the same time it's harder to get usage patterns that will generate a cache miss with every access. So replace two with some value n to get your n-way associative caches like the 8-way and the 16-way. As n gets bigger the more complex the circuitry gets, but as n gets smaller the easier it is to have certain memory access patterns that it can't handle efficiently.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!