如何检测Windows、Mac和Linux上的物理处理器/内核数量我有一个多线程的C++应用程序，它运行在Windows、MAC和一些Linux的口味上。p>_C++_Windows_Macos_Assembly_Hyperthreading

如何检测Windows、Mac和Linux上的物理处理器/内核数量我有一个多线程的C++应用程序，它运行在Windows、MAC和一些Linux的口味上。p>

c++ windows macos assembly

如何检测Windows、Mac和Linux上的物理处理器/内核数量我有一个多线程的C++应用程序，它运行在Windows、MAC和一些Linux的口味上。p>,c++,windows,macos,assembly,hyperthreading,C++,Windows,Macos,Assembly,Hyperthreading,长话短说：为了让它以最高效率运行，我必须能够为每个物理处理器/核心实例化一个线程。创建比物理处理器/内核更多的线程会大大降低程序的性能。我已经能够在这三个平台上正确地检测到逻辑处理器/内核的数量。为了能够正确检测物理处理器/内核的数量，我必须检测是否支持并激活了hyper-treading 因此，我的问题是，是否有一种方法可以检测超线程是否被支持和启用？如果是这样的话，具体是怎样的。我不知道这三种方法都是以相同的方式公开信息的，但是如果您可以安全地假设NT内核将根据POSIX标准报告设备信息（据

长话短说：为了让它以最高效率运行，我必须能够为每个物理处理器/核心实例化一个线程。创建比物理处理器/内核更多的线程会大大降低程序的性能。我已经能够在这三个平台上正确地检测到逻辑处理器/内核的数量。为了能够正确检测物理处理器/内核的数量，我必须检测是否支持并激活了hyper-treading

因此，我的问题是，是否有一种方法可以检测超线程是否被支持和启用？如果是这样的话，具体是怎样的。

我不知道这三种方法都是以相同的方式公开信息的，但是如果您可以安全地假设NT内核将根据POSIX标准报告设备信息（据推测NT支持POSIX标准），那么您可以取消该标准

然而，设备管理的不同常常被认为是跨平台开发的绊脚石之一。我最多将其实现为三条逻辑，我不会试图编写一段代码来均匀处理所有平台

<>当然，所有的假设都是C++。对于ASM，我想您将只在x86或amd64 CPU上运行？您仍然需要两个分支路径，每个架构一个，您需要单独测试Intel和AMD（IIRC），但大体上您只需要检查CPUID。这就是你想要找到的吗？英特尔/AMD系列CPU上ASM的CPUID？

Windows专用解决方案描述如下：

对于linux，/proc/cpuinfo文件。我不是在运行linux 现在，我不能给你更多的细节。你可以数数物理/逻辑处理器实例。如果逻辑计数是物理的两倍，则您已启用HT

（仅适用于x86）。

请注意，这并没有给出预期的物理核数，而是逻辑核数。

如果您可以使用C++11（感谢下面alfC的评论）：

#包括
#包括
int main（）{
标准：：cout编辑：由于英特尔一直在胡思乱想，这不再是100%正确。
我对这个问题的理解是，你问的是如何检测CPU内核的数量与CPU线程的数量，这与检测系统中逻辑和物理内核的数量不同。CPU内核通常不被操作系统视为物理内核，除非它们有自己的包或死。因此，操作系统会报告Core 2 Duo，for例如，具有1个物理CPU和2个逻辑CPU，并且具有超线程的Intel P4将以完全相同的方式报告，尽管2个超线程与2个CPU内核在性能方面是完全不同的
我一直在努力解决这个问题，直到我拼凑出下面的解决方案，我相信它对AMD和Intel处理器都有效。据我所知，我可能是错的，AMD还没有CPU线程，但他们提供了一种方法来检测它们，我认为这将在未来可能有CPU线程的AMD处理器上工作
简而言之，以下是使用CPUID指令的步骤：
使用CPUID函数0检测CPU供应商
从CPUID函数1检查CPU功能EDX中的HTT位28
从CPUID函数1中的EBX[23:16]获取逻辑内核计数
获取实际的非线程CPU内核计数
如果vendor=='GenuineIntel'，这是1加上CPUID函数4中的EAX[31:26]
如果vendor=='AuthenticAMD'，这是1加上CPUID函数0x8000008中的ECX[7:0]

听起来很困难，但这里有一个希望与平台无关的C++程序，它的诀窍是：
#include <iostream>
#include <string>

using namespace std;


void cpuID(unsigned i, unsigned regs[4]) {
#ifdef _WIN32
  __cpuid((int *)regs, (int)i);

#else
  asm volatile
    ("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
     : "a" (i), "c" (0));
  // ECX is set to zero for CPUID function 4
#endif
}


int main(int argc, char *argv[]) {
  unsigned regs[4];

  // Get vendor
  char vendor[12];
  cpuID(0, regs);
  ((unsigned *)vendor)[0] = regs[1]; // EBX
  ((unsigned *)vendor)[1] = regs[3]; // EDX
  ((unsigned *)vendor)[2] = regs[2]; // ECX
  string cpuVendor = string(vendor, 12);

  // Get CPU features
  cpuID(1, regs);
  unsigned cpuFeatures = regs[3]; // EDX

  // Logical core count per CPU
  cpuID(1, regs);
  unsigned logical = (regs[1] >> 16) & 0xff; // EBX[23:16]
  cout << " logical cpus: " << logical << endl;
  unsigned cores = logical;

  if (cpuVendor == "GenuineIntel") {
    // Get DCP cache info
    cpuID(4, regs);
    cores = ((regs[0] >> 26) & 0x3f) + 1; // EAX[31:26] + 1

  } else if (cpuVendor == "AuthenticAMD") {
    // Get NC: Number of CPU cores - 1
    cpuID(0x80000008, regs);
    cores = ((unsigned)(regs[2] & 0xff)) + 1; // ECX[7:0] + 1
  }

  cout << "    cpu cores: " << cores << endl;

  // Detect hyper-threads  
  bool hyperThreads = cpuFeatures & (1 << 28) && cores < logical;

  cout << "hyper-threads: " << (hyperThreads ? "true" : "false") << endl;

  return 0;
}

// test.cpp
#include <omp.h>
#include <iostream>

using namespace std;

int main(int argc, char** argv) {
  int nThreads = omp_get_max_threads();
  cout << "Can run as many as: " << nThreads << " threads." << endl;
}

英特尔（R）核心（TM）2四处理器Q8400@2.66GHz：
 logical cpus: 4
    cpu cores: 4
hyper-threads: false

英特尔（R）至强（R）CPU E5520@2.27GHz（带x2个物理CPU包）：
英特尔（R）奔腾（R）4 CPU 3.00GHz：
 logical cpus: 2
    cpu cores: 1
hyper-threads: true

OpenMP应该做到这一点：
#include <iostream>
#include <string>

using namespace std;


void cpuID(unsigned i, unsigned regs[4]) {
#ifdef _WIN32
  __cpuid((int *)regs, (int)i);

#else
  asm volatile
    ("cpuid" : "=a" (regs[0]), "=b" (regs[1]), "=c" (regs[2]), "=d" (regs[3])
     : "a" (i), "c" (0));
  // ECX is set to zero for CPUID function 4
#endif
}


int main(int argc, char *argv[]) {
  unsigned regs[4];

  // Get vendor
  char vendor[12];
  cpuID(0, regs);
  ((unsigned *)vendor)[0] = regs[1]; // EBX
  ((unsigned *)vendor)[1] = regs[3]; // EDX
  ((unsigned *)vendor)[2] = regs[2]; // ECX
  string cpuVendor = string(vendor, 12);

  // Get CPU features
  cpuID(1, regs);
  unsigned cpuFeatures = regs[3]; // EDX

  // Logical core count per CPU
  cpuID(1, regs);
  unsigned logical = (regs[1] >> 16) & 0xff; // EBX[23:16]
  cout << " logical cpus: " << logical << endl;
  unsigned cores = logical;

  if (cpuVendor == "GenuineIntel") {
    // Get DCP cache info
    cpuID(4, regs);
    cores = ((regs[0] >> 26) & 0x3f) + 1; // EAX[31:26] + 1

  } else if (cpuVendor == "AuthenticAMD") {
    // Get NC: Number of CPU cores - 1
    cpuID(0x80000008, regs);
    cores = ((unsigned)(regs[2] & 0xff)) + 1; // ECX[7:0] + 1
  }

  cout << "    cpu cores: " << cores << endl;

  // Detect hyper-threads  
  bool hyperThreads = cpuFeatures & (1 << 28) && cores < logical;

  cout << "hyper-threads: " << (hyperThreads ? "true" : "false") << endl;

  return 0;
}

// test.cpp
#include <omp.h>
#include <iostream>

using namespace std;

int main(int argc, char** argv) {
  int nThreads = omp_get_max_threads();
  cout << "Can run as many as: " << nThreads << " threads." << endl;
}

（您可能还需要告诉编译器使用stdc++库）：
据我所知，OpenMP旨在解决这类问题。
当前使用CPUID的投票率最高的答案似乎已过时。它报告了错误的逻辑和物理处理器数量。这似乎从该答案中得到了证实
具体来说，使用CPUID.1.EBX[23:16]获取逻辑处理器，或使用CPUID.4.EAX[31:26]+1获取具有英特尔处理器的物理处理器，在我拥有的任何英特尔处理器上都不会给出正确的结果
对于Intel CPUID，应使用.Bh。该解决方案看起来并不简单。对于AMD，需要使用不同的解决方案
这是Intel的源代码，它报告了正确的物理和逻辑内核数量以及正确的套接字数量。我在80逻辑内核、40物理内核、4套接字Intel系统上测试了这一点
这是AMD的源代码。它在我的单插槽Intel系统上给出了正确的结果，但在我的四插槽系统上没有。我没有AMD系统要测试
我还没有仔细分析源代码，以找到一个简单的CPUID答案（如果有的话）。似乎如果解决方案可以改变（似乎已经改变），那么最好的解决方案就是使用库或操作系统调用
编辑：
这是一个针对使用CPUID leaf 11（Bh）的英特尔处理器的解决方案。执行此操作的方法是在逻辑处理器上循环，并从CPUID获取每个逻辑处理器的x2APIC ID，并计算最低有效位为零的x2APIC ID数。对于没有超线程的系统，x2APIC ID将始终为偶数。对于具有超线程的系统，每个x2APIC ID将具有偶数和奇数版本
// input:  eax = functionnumber, ecx = 0
// output: eax = output[0], ebx = output[1], ecx = output[2], edx = output[3]
//static inline void cpuid (int output[4], int functionnumber)  

int getNumCores(void) {
    //Assuming an Intel processor with CPUID leaf 11
    int cores = 0;
    #pragma omp parallel reduction(+:cores)
    {
        int regs[4];
        cpuid(regs,11);
        if(!(regs[3]&1)) cores++; 
    }
    return cores;
}

必须绑定线程才能工作。OpenMP默认情况下不绑定线程。设置export-OMP\u-PROC\u-bind=true
将绑定线程，或者可以在代码中绑定线程，如中所示
我在我的4核/8 HT系统上测试了这个，它返回了
$ g++ -fopenmp -o test.o test.cpp

$ g++ -fopenmp -o test.o -lstdc++ test.cpp

// input:  eax = functionnumber, ecx = 0
// output: eax = output[0], ebx = output[1], ecx = output[2], edx = output[3]
//static inline void cpuid (int output[4], int functionnumber)  

int getNumCores(void) {
    //Assuming an Intel processor with CPUID leaf 11
    int cores = 0;
    #pragma omp parallel reduction(+:cores)
    {
        int regs[4];
        cpuid(regs,11);
        if(!(regs[3]&1)) cores++; 
    }
    return cores;
}

$ sysctl hw
hw.ncpu: 24
hw.activecpu: 24
hw.physicalcpu: 12  <-- number of cores
hw.physicalcpu_max: 12
hw.logicalcpu: 24   <-- number of cores including hyper-threaded cores
hw.logicalcpu_max: 24
hw.packages: 2      <-- number of CPU packages
hw.ncpu = 24
hw.availcpu = 24

#include <hwloc.h>

int nPhysicalProcessorCount = 0;

hwloc_topology_t sTopology;

if (hwloc_topology_init(&sTopology) == 0 &&
    hwloc_topology_load(sTopology) == 0)
{
    nPhysicalProcessorCount =
        hwloc_get_nbobjs_by_type(sTopology, HWLOC_OBJ_CORE);

    hwloc_topology_destroy(sTopology);
}

if (nPhysicalProcessorCount < 1)
{
#ifdef _OPENMP
    nPhysicalProcessorCount = omp_get_num_procs();
#else
    nPhysicalProcessorCount = 1;
#endif
}

//EDIT INCLUDES

#ifdef _WIN32
    #include <windows.h>
#elif MACOS
    #include <sys/param.h>
    #include <sys/sysctl.h>
#else
    #include <unistd.h>
#endif

uint32_t registers[4];
unsigned logicalcpucount;
unsigned physicalcpucount;
#ifdef _WIN32
SYSTEM_INFO systeminfo;
GetSystemInfo( &systeminfo );

logicalcpucount = systeminfo.dwNumberOfProcessors;

#else
logicalcpucount = sysconf( _SC_NPROCESSORS_ONLN );
#endif

__asm__ __volatile__ ("cpuid " :
                      "=a" (registers[0]),
                      "=b" (registers[1]),
                      "=c" (registers[2]),
                      "=d" (registers[3])
                      : "a" (1), "c" (0));

unsigned CPUFeatureSet = registers[3];
bool hyperthreading = CPUFeatureSet & (1 << 28);

if (hyperthreading){
    physicalcpucount = logicalcpucount / 2;
} else {
    physicalcpucount = logicalcpucount;
}

fprintf (stdout, "LOGICAL: %i\n", logicalcpucount);
fprintf (stdout, "PHYSICAL: %i\n", physicalcpucount);

    #include <iostream>
    #include <boost/thread.hpp>

    int main()
    {
        std::cout << boost::thread::physical_concurrency();
        return 0;
    }

$ python -c "import psutil; psutil.cpu_count(logical=False)"
4

#include <stdio.h>
#include <libcpuid.h>

int main(void)
{
    if (!cpuid_present()) {                                                // check for CPUID presence
        printf("Sorry, your CPU doesn't support CPUID!\n");
        return -1;
    }

if (cpuid_get_raw_data(&raw) < 0) {                                    // obtain the raw CPUID data
        printf("Sorry, cannot get the CPUID raw data.\n");
        printf("Error: %s\n", cpuid_error());                          // cpuid_error() gives the last error description
        return -2;
}

if (cpu_identify(&raw, &data) < 0) {                                   // identify the CPU, using the given raw data.
        printf("Sorrry, CPU identification failed.\n");
        printf("Error: %s\n", cpuid_error());
        return -3;
}

printf("Found: %s CPU\n", data.vendor_str);                            // print out the vendor string (e.g. `GenuineIntel')
    printf("Processor model is `%s'\n", data.cpu_codename);                // print out the CPU code name (e.g. `Pentium 4 (Northwood)')
    printf("The full brand string is `%s'\n", data.brand_str);             // print out the CPU brand string
    printf("The processor has %dK L1 cache and %dK L2 cache\n",
        data.l1_data_cache, data.l2_cache);                            // print out cache size information
    printf("The processor has %d cores and %d logical processors\n",
        data.num_cores, data.num_logical_cpus);                        // print out CPU cores information

}