Sign in to follow this  

Multithreading Nowadays

This topic is 647 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi. I don't have intensive HW knowledge, but I've noticed that a lot of the rather recent intel CPUs I've looked at don't have HyperThreading support, and the specs say the number of threads is equal to the number of cores. Does that mean dividing the processing work in my application (like doing physics calculations that don't involve IO/networking) into multiple threads will not provide any performance benefits, and that multi-processing is where the gains can be had? Thanks.

Share this post


Link to post
Share on other sites

Hyperthreading can also be a double edged sword at times. It is [i]like[/i] a second processor core, but not exactly the same thing. I really haven't done much work with hyperthreaded processors, and all that I honestly remember about the edge cases is that they exist. If you are going to start programming things with an aim at providing strong support for Intel's tech then it is probably a good idea to spend some time digging around with google for the various pitfalls of hyperthreading.

Share this post


Link to post
Share on other sites

I think at least for games, another significant problem with hyperthreading is that many workloads don't scale perfectly with core count. See https://en.wikipedia.org/wiki/Amdahl's_law for one cause of poor scaling.

 

That is, even if you compare say two core performance to four core performance without any hyperthreading, then the four cores probably won't be exactly double the speed of two cores. It might get say 1.8 times faster instead.

 

This means that the performance benefit from hyperthreading has to be higher than the overheads from using more threads, if it's going to actually improve performance.

Share this post


Link to post
Share on other sites

I think at least for games, another significant problem with hyperthreading is that many workloads don't scale perfectly with core count. See https://en.wikipedia.org/wiki/Amdahl's_law for one cause of poor scaling.

 

That is, even if you compare say two core performance to four core performance without any hyperthreading, then the four cores probably won't be exactly double the speed of two cores. It might get say 1.8 times faster instead.

 

This means that the performance benefit from hyperthreading has to be higher than the overheads from using more threads, if it's going to actually improve performance.

Even without hyperthreading, you often have a measurable benefit from using more threads than cores because there's always a busy thread to swap in. Especially when your application has different workloads, instead of always doing the same computation for different data. In a way, Hyperthreading is just a way to make this more efficient through hardware. However, with hyperthreading or without, the optimal number of worker threads is something you can often measure directly; knowing about hyperthreading just explains the results.

Share this post


Link to post
Share on other sites


Even without hyperthreading, you often have a measurable benefit from using more threads than cores because there's always a busy thread to swap in.
This is situation-dependent of course, but my experiences on my current project disagree. If I've got more threads than cores, then the overhead of the extra context-switches seems to create an overall performance loss. I found that when running my threads with 100% workloads (no idle time), performance of my game, and system-wide OS responsiveness suffered greatly if I created one thread per HW-thread.

On Intel Hyperthreaded CPU's, I've ended up running with one thread per core (not two!), and on other CPU's, I run one thread per core minus one (e.g. on an 8-core, I'll run 7 threads), which leaves a bit of extra CPU time spare for the OS and other applications to use, even if I'm maxing out my threads with 100% workloads.

 

Note that that's my "main & worker" threads anyway. I also have a bunch of extra "mostly sleeping" threads - e.g. middleware like FMOD, or your NVidia graphics drivers will create a bunch of it's own threads internally -- which can also run on that spare AMD core, or the Intel "hyperthreads" :)

Share this post


Link to post
Share on other sites

Oh, there was the other part of the question.

 

The original question also mentioned "I've noticed that a lot of the rather recent intel CPUs I've looked at don't have HyperThreading support."

 

The only ones I'm aware of that have done that are the system-on-a-chip (SoC) designs that are low-power, both lower power in energy consumption and lower power in terms of processing power.  Both the Silvermont and Goldmont lines are meant for x86-family tablets and embedded systems. Their architectures are much reduced from their high-power desktop brethren, no hyperthreading, much smaller caches, slower chip speeds, no virtualization systems, etc.

 

All of the Core2 processors, those in the family of i3, i5, or i7, support hyperthreading as far as I can tell. They can be disabled by bios settings, but the chips still support it.

Share this post


Link to post
Share on other sites


All of the Core2 processors, those in the family of i3, i5, or i7, support hyperthreading as far as I can tell. They can be disabled by bios settings, but the chips still support it.

Most of them do now, but they didn't for a long time.

 

For example, my i5 4570 doesn't, as they only offered hyperthreading on the i7s back then.

Share this post


Link to post
Share on other sites

The biggest benefit is that you get two sets of decoders.

Are you sure about that?  Going by CPU articles the decoder is shared between threads.
I'm very sure that each hyperthreaded processor has its own set of four decoders making a total of eight, not four decoders shared.

Each HT processor decodes to its own alias table into a combined ROB, and share the OOO core's work queue, called the reservation station. Decoding each instruction stream is an independent act.

As long as there is space on the reservation station and the system isn't blocked too severely by data cache misses or other problems, having two sets of decoders tends to mean both of the HT processors are working interleaved on the same underworked core. Generally the core is still underworked, waiting idly for data to come in, but now it can have two sets of instructions to work on while still waiting around for data.

Share this post


Link to post
Share on other sites

I'm very sure that each hyperthreaded processor has its own set of four decoders making a total of eight, not four decoders shared.


Do you have a source on this?  Something like this maybe: http://www.realworldtech.com/haswell-cpu/2/
Well, I was using the Intel reference manual for the chipset family which I think I was pretty clear about referencing in the article, but your link works just as well. That link shows the four decoders that feed into the ROB, and each HT processor has its own set that feeds into a shared ROB.

Share this post


Link to post
Share on other sites

All of the Core2 processors, those in the family of i3, i5, or i7, support hyperthreading as far as I can tell. They can be disabled by bios settings, but the chips still support it.
 

I3 and I7 support hyperthreading I5's don't at least on desktop.

 

 

 


Well, I was using the Intel reference manual for the chipset family which I think I was pretty clear about referencing in the article

There are no references in your article, unless I'm missing it somewhere in the middle of the article.

 

 

On this page: table 1 doesn't seem to list decoders under replicated functionality.  https://software.intel.com/en-us/articles/performance-insights-to-intel-hyper-threading-technology/

 

It also not listed as replicated in section 2.6.1 .1 of the Intel® 64 and IA-32 Architectures Optimization Reference Manual here: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

Edited by Infinisearch

Share this post


Link to post
Share on other sites


That's a mobile CPU.

Sure, but it's not like they build a separate i5 chip for desktop without hyperthreading. I'd honestly expect that i5's and i3's are just the lower-binned i7 parts with cores, hyperthreading and or cache disabled.

 

What's interesting is that they don't lock the hyperthreading out on the dual-core i5's that go in laptops. Too much of a performance degradation relative to the competition?

Share this post


Link to post
Share on other sites


Sure, but it's not like they build a separate i5 chip for desktop without hyperthreading.

Actually I think they do and always have.  Pentiums and i5 have no hyperthreading, i3 and i7 have it and thats how its always been on desktop.  On laptop its muddied.  But as far as I know i5's are a separate chip.

Share this post


Link to post
Share on other sites

But as far as I know i5's are a separate chip.

 

I'm having trouble tracing down sources, but the Wikipedia page mentions the following about the older Nehalem i5 processors:

The same processors with different sets of features (Hyper-Threading and other clock frequencies) enabled are sold as Core i7-8xx and Xeon 3400-series processors

It's possible they've drifted further apart since then, but I doubt it. Chip yields for highly-clocked quad-core processors are terrible, what else to do with all the chips with one dead core?

Edited by swiftcoder

Share this post


Link to post
Share on other sites

Another thing, there is not just one set of technologies here or a specific chipset.  Intel incorporated it back in 1995 to many of their processors, and there are variations between different micro-architectures. 

 

What is true of some chips or specific lines of chips may have changed and not been true on another line of chips. Many features have changed and evolved over the 20 years since this was introduced.

Share this post


Link to post
Share on other sites

Does that mean dividing the processing work in my application (like doing physics calculations that don't involve IO/networking) into multiple threads will not provide any performance benefits


Hyperthreading is/was almost as good as doubling cores for multi-threaded applications which rely heavily on memory accesses, locking, and input/output work (reading files, querying databases, etc). Database and game servers probably benefited a lot from the introduction of hyperthreading.
But since logical cores share floating point units with physical cores, hyperthreading actually tended to (marginally) slow down math-heavy applications like what you're describing. But even math programs are going to stall sometimes, so unless you're on some dedicated machine without any background processes whatsoever, you might as well let them use processor time that'd otherwise go to waste.
https://arstechnica.com/civis//viewtopic.php?f=8&t=1289011

4 physical cores is better than 2 hyperthreaded cores, all other things being equal. But 2 hyperthreaded cores are better in 99% of cases than 2 normal cores.

Share this post


Link to post
Share on other sites

This topic is 647 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this