# Counting cores and hardware threads

This topic is 2074 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I've been using the two techniques (below) to determine how many threads to launch in my thread pool.
With high workloads, I've found that my game actually runs better with one thread per core, not one thread per hardware-thread (aka hyperthreaded cores x2), so I want a code routine that can tell me how many cores the user's CPU has (not how many HW-threads/hyperthreads it has).

I've only tested them on Intel, and they've worked so far.

The problem is that I just built a new PC with an AMD FX(tm)-8350 Eight-Core Processor, which should give numCores == 8, numThreads == 8...
However, on this CPU, I get numCores == 4, numThreads == 8, which makes my engine think that it's a hyper-threaded quad-core, so my thread-pool only launches 4 threads!

If I run CPU-Z, then their GUI correctly shows 8 cores and 8 HW-threads... so my code must be faulty?
Does anyone else have any routines like this to examine the user's CPU capabilities?
My code is below:
Technique #1 (minus error checking and free'ing of temp buffer) - shows the number of physical cores, and the number of hardware threads.

	BOOL (WINAPI *getLogicalProcessorInformation)( PSYSTEM_LOGICAL_PROCESSOR_INFORMATION, PDWORD )
= (BOOL(WINAPI*)(PSYSTEM_LOGICAL_PROCESSOR_INFORMATION,PDWORD)) GetProcAddress( GetModuleHandle("kernel32"), "GetLogicalProcessorInformation" );
DWORD bufferSize = 0;
getLogicalProcessorInformation(0, &bufferSize);
SYSTEM_LOGICAL_PROCESSOR_INFORMATION* buffer = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION*)malloc(bufferSize);
getLogicalProcessorInformation(buffer, &bufferSize);
uint numCores = 0;
for( byte* end=((byte*)buffer)+bufferSize; (byte*)buffer < end; ++buffer )
{
numCores   += ( buffer->Relationship == RelationProcessorCore ) ? 1                                   : 0;
}

Technique #2 - only shows the number of hardware threads.

	SYSTEM_INFO si;
GetSystemInfo( &si );
uint numThreads = si.dwNumberOfProcessors;

Obviously the above code is for Windows. If you've got tips for other platforms that would also be helpful though

Edited by Hodgman

##### Share on other sites
Just a thought -- AMD doesn't implement Hyperthreading (or equivalent under a different name) in any of their CPUs right?

So maybe I can use the CPUID instruction to detect if it's an AMD CPU, and if so I use the number of HW-threads, otherwise I use the number of cores...

Damn this is ugly.
static bool IsAMD()
{
static const char AuthenticAMD[] = "AuthenticAMD";
s32 CPUInfo[4];
__cpuid( CPUInfo, 0, 0 );
return CPUInfo[1] == *reinterpret_cast<const s32*>( AuthenticAMD )
&& CPUInfo[2] == *reinterpret_cast<const s32*>( AuthenticAMD + 8 )
&& CPUInfo[3] == *reinterpret_cast<const s32*>( AuthenticAMD + 4 );
}
Edited by Hodgman

##### Share on other sites

Have you seen this?

##### Share on other sites

Have you seen this?

Nope. Thanks But it reports the same thing -- 4 hyperthreaded cores, instead of 8 non-hyperthreaded cores.

I read somewhere that the 8 core fx series has 4 'modules' each with 2 integer cores and a shared floating point core per module. so, maybe that has something to do with this.

Yeah, this is part of their AVX implementation IIRC. They can either do very-wide SIMD on one core at a time (out of two), or they can do no-so-wide SIMD on both at once... or something like that.
So they're kinda hyperthreaded when it comes to float ops, but not at all hyperthreaded with anything else -- unlike real Intel hyperthreading, where everything including the L1 caches are shared.

##### Share on other sites

The instruction L1 cache, instruction decoder, and branch prediction are shared too. So it's really a hybrid. But if in your case it makes more sense to treat them as different physical cores, then I think you'll just have to assume AMD implies number of physical cores equals the number of virtual cores.

##### Share on other sites

Ah interesting, I didn't notice/know that.
It's a big L1 instruction cache though; you often see L1 caches about 32KB in size, but the (shared by two cores) instruction cache is 64KB -- arguably big enough to share.
However, the (non-shared) L1 data caches are only 16KB each.

Yeah I found via testing on my Intel CPUs that my game ran better with "numCores" threads in the pool rather than "numHwThreads" in the pool. I'll have to do a bunch more testing on these new AMD chips and see if they actually run better for me or not with 8 or 4 threads...

##### Share on other sites

Is this with "Core parking" windows patch applied?

I am surprised FX is reporting 4 cores, AMD basically shoot itself in the foot for months insisting their logical cores were real hardware cores!

##### Share on other sites

If you've got tips for other platforms that would also be helpful though

Mac OS X:

#include <sys/sysctl.h>

int numCores;
size_t lenNumCores =4;

sysctlbyname("hw.physicalcpu", &numCores, &lenNumCores, nullptr, 0);
sysctlbyname("hw.logicalcpu", &numThreads, &lenNumThreads, nullptr, 0);

##### Share on other sites

After reading http://blog.stuffedcow.net/2011/08/hyperthreading-performance/ where someone tried multiple cpu benchmarks with and without using hyperthreaded cores and only one of those got better when not using them, it made me doubt the practise of recommending not using them by default makes sense without benchmarking your own program thoroughly.

• 21
• 13
• 9
• 17
• 13