Jump to content

  • Log In with Google      Sign In   
  • Create Account

#ActualMatias Goldberg

Posted 29 December 2012 - 05:01 PM

Yep, a 4-core CPU is no much different from 4 single core PCs running at the same time; except for very basic synchronization mechanisms i.e. the 4 cores share the same RAM and Bus.

 

For example typical Intel Core 2 Quad, each core has it's own L1 cache, L2 cache is shared by group of 2 cores.

In the typical Intel Core i7, I don't remember how the L2 is shared (or if it was...), but there is an additional L3 cache shared by all cores.

 

Do not confuse "multithreading using multiple cores" vs "instruction level parallelism" vs "multithreading using hyperthreading"

 

The first one is the one described.

 

The second one has nothing to do with cores or multithreading and is present in single core CPUs since the original Pentium. Each CPU has more than one pipeline to execute multiple instructions at the same time. If you have for example "add eax, ecx" followed by "add edi, ecx" (i.e. a = a + b & c = c + b) both instructions are independent and each pipeline will execute both at the same time. It may also be called "instruction pairing". Not all instruction permutations can be paired (because some CPU architectures don't clone the whole pipeline, just a part of it. Usually for cheaper manufacturing or due to high power consumption)

Note that the CPU won't pair two dependent instructions. For example "add eax, ecx" followed by "add edi, eax" can't be executed in parallel, because the second instruction depends on the result from the former one. Compiler optimizations rearrange instructions to interleave non-related operations so that instruction pairing chance can be maximized.

Also take in mind x86/x64 CPUs implement something called "Out of Order Execution" which is basically a component in the CPU looking ahead in which instructions are going to be executed, and if it finds one that is independent (i.e. one that doesn't depend on the result of eax), and executes that instruction at the same time as "add eax, ecx" then caches the result in temporary HW space for the time those instructions that come later get actually executed. OoOE may also kick in when one of the pipelines is stalled waiting for data in RAM to arrive. In other words, it does the same as the compiler does (reorder operations to maximize instruction pairing and minimize stalls) but at hardware level, and you have little to no control over it (except for instructions that issue a memory barrier)

When dealing with lock-less multithreading (as in multi-core) code, OoOE can be a PITA if one doesn't make correct use of memory barriers (to prevent the OoOE unit from looking farther than the barrier, until the barrier is reached) because it may lead to race conditions.

 

The third one, hyperthreading, was invented by Intel, and a fake of the first one (multicore). When the CPU is only using one of the pipelines described as instruction-level parallelism and the OoOE couldn't do anything to prevent it, we've got a lot of idle pipelines.

Because those pipelines are "almost" an extra core, Hyperthreading kicks in and simulates an extra core to execute a parallel thread, but is not near in the same level as a real additional core because: 1. Not all parts of the pipeline are duplicated and 2. The main thread may actually start using those unused pipelines, leaving no spare components to dedicate to the hyperthreaded thread.

Intel claims Hyperthreading yields an average gain of 15% in performance.

Btw, if you open Task Manager in a single core system with hyperthreading, it will tell you there are two cores (the real one + fake one). In Intel Core i7 with 8 cores, it will tell you there are 16 cores (8 real ones + 8 fake ones)

Not all CPUs come with HT, in fact, most of them don't.


#2Matias Goldberg

Posted 29 December 2012 - 05:00 PM

Yep, a 4-core CPU is no much different from 4 single core PCs running at the same time; except for very basic synchronization mechanisms i.e. the 4 cores share the same RAM and Bus.

 

For example typical Intel Core 2 Quad, each core has it's own L1 cache, L2 cache is shared by group of 2 cores.

In the typical Intel Core i7, I don't remember how the L2 is shared (or if it was...), but there is an additional L3 cache shared by all cores.

 

Do not confuse "multithreading using multiple cores" vs "instruction level parallelism" vs "multithreading using hyperthreading"

 

The first one is the one described.

 

The second one has nothing to do with cores or multithreading and is present in single core CPUs since the original Pentium. Each CPU has more than one pipeline to execute multiple instructions at the same time. If you have for example "add eax, ecx" followed by "add edi, ecx" (i.e. a = a + b & c = c + b) both instructions are independent and each pipeline will execute both at the same time. It may also be called "instruction pairing". Not all instruction permutations can be paired (because some CPU architectures don't clone the whole pipeline, just a part of it. Usually for cheaper manufacturing or due to high power consumption)

Note that the CPU won't pair two dependent instructions. For example "add eax, ecx" followed by "add edi, eax" can't be executed in parallel, because the second instruction depends on the result from the former one. Compiler optimizations rearrange instructions to interleave non-related operations so that instruction pairing chance can be maximized.

Also take in mind x86/x64 CPUs implement something called "Out of Order Execution" which is basically a component in the CPU looking ahead in which instructions are going to be executed, and if it finds one that is independent (i.e. one that doesn't depend on the result of eax), and executes that instruction at the same time as "add eax, ecx" and then caching the result in temporary HW space for the time those instructions that come later get actually executed. OoOE may also kick in when one of the pipelines is stalled waiting for data in RAM to arrive. In other words, it does the same as the compiler does (reorder operations to maximize instruction pairing and minimize stalls) but at hardware level, and you have little to no control over it (except for instructions that issue a memory barrier)

When dealing with lock-less multithreading (as in multi-core) code, OoOE can be a PITA if one doesn't make correct use of memory barriers (to prevent the OoOE unit from looking farther than the barrier, until the barrier is reached) because it may lead to race conditions.

 

The third one, hyperthreading, was invented by Intel, and a fake of the first one (multicore). When the CPU is only using one of the pipelines described as instruction-level parallelism and the OoOE couldn't do anything to prevent it, we've got a lot of idle pipelines.

Because those pipelines are "almost" an extra core, Hyperthreading kicks in and simulates an extra core to execute a parallel thread, but is not near in the same level as a real additional core because: 1. Not all parts of the pipeline are duplicated and 2. The main thread may actually start using those unused pipelines, leaving no spare components to dedicate to the hyperthreaded thread.

Intel claims Hyperthreading yields an average gain of 15% in performance.

Btw, if you open Task Manager in a single core system with hyperthreading, it will tell you there are two cores (the real one + fake one). In Intel Core i7 with 8 cores, it will tell you there are 16 cores (8 real ones + 8 fake ones)

Not all CPUs come with HT, in fact, most of them don't.


#1Matias Goldberg

Posted 29 December 2012 - 04:58 PM

Yep, a 4-core CPU is no much different from 4 single core PCs running at the same time; except for very basic synchronization mechanisms i.e. the 4 cores share the same RAM and Bus.

 

For example typical Intel Core 2 Quad, each core has it's own L1 cache, L2 cache is shared by group of 2 cores.

In the typical Intel Core i7, I don't remember how the L2 is shared (or if it was...), but there is an additional L3 cache shared by all cores.

 

Do not confuse "multithreading using multiple cores" vs "instruction level parallelism" vs "multithreading using hyperthreading"

 

The first one is the one described.

 

The second one has nothing to do with cores or multithreading and is present in single core CPUs since the original Pentium. Each CPU has more than one pipeline to execute multiple instructions at the same time. If you have for example "add eax, ecx" followed by "add edi, ecx" (i.e. a = a + b & c = c + b) both instructions are independent and each pipeline will execute both at the same time. It may also be called "instruction pairing". Not all instruction permutations can be paired (because some CPU architectures don't clone the whole pipeline, just a part of it. Usually for cheaper manufacturing or due to high power consumption)

Note that the CPU won't pair two dependent instructions. For example "add eax, ecx" followed by "add edi, eax" can't be executed in parallel, because the second instruction depends on the result from the former one. Compilers optimizations rearrange instructions to interleave non-related operations so that instruction pairing chance can be maximized.

Also take in mind x86/x64 CPUs implement something called "Out of Order Execution" which is basically a component in the CPU looking ahead in which instructions are going to be executed, and if it finds one that is independent (i.e. one that doesn't depend on the result of eax), and executes that instruction at the same time as "add eax, ecx" and then caching the result in temporary HW space for the time those instructions that come later get actually executed. OoOE may also kick in when one of the pipelines is stalled waiting for data in RAM to arrive. In other words, it does the same as the compiler does (reorder operations to maximize instruction pairing and minimize stalls) but at hardware level, and you have little to no control over it (except for instructions that issue a memory barrier)

When dealing with lock-less multithreading (as in multi-core) code, OoOE can be a PITA if one doesn't make correct use of memory barriers (to prevent the OoOE unit from looking farther than the barrier, until the barrier is reached) because it may lead to race conditions.

 

The third one, hyperthreading, was invented by Intel, and a fake of the first one (multicore). When the CPU is only using one of the pipelines described as instruction-level parallelism and the OoOE couldn't do anything to prevent it, we've got a lot of idle pipelines.

Because those pipelines are "almost" an extra core, Hyperthreading kicks in and simulates an extra core to execute a parallel thread, but is not near in the same level as a real additional core because: 1. Not all parts of the pipeline are duplicated and 2. The main thread may actually start using those unused pipelines, leaving no spare components to dedicate to the hyperthreaded thread.

Intel claims Hyperthreading yields an average gain of 15% in performance.

Btw, if you open Task Manager in a single core system with hyperthreading, it will tell you there are two cores (the real one + fake one). In Intel Core i7 with 8 cores, it will tell you there are 16 cores (8 real ones + 8 fake ones)

Not all CPUs come with HT, in fact, most of them don't.


PARTNERS