C++ 使用C++；11带有GCC 4.8.0和多线程_C++_Multithreading_C++11_Solaris_Sparc

C++ 使用C++；11带有GCC 4.8.0和多线程

c++ multithreading c++11

C++ 使用C++；11带有GCC 4.8.0和多线程,c++,multithreading,c++11,solaris,sparc,C++,Multithreading,C++11,Solaris,Sparc,我创建了一个简单的程序来测量线程性能。为了说明我的观点，我删掉了一个较大程序的部分。希望它读起来不太可怕节目如下： #include <sstream> #include <thread> #include <list> #include <map> #include <mutex> #include <condition_variable> #include <iostream> #include <s

我创建了一个简单的程序来测量线程性能。为了说明我的观点，我删掉了一个较大程序的部分。希望它读起来不太可怕

节目如下：

#include <sstream>
#include <thread>
#include <list>
#include <map>
#include <mutex>
#include <condition_variable>
#include <iostream>
#include <string.h>

std::mutex m_totalTranMutex;
int m_totalTrans = 0;
bool m_startThreads = false;
std::condition_variable m_allowThreadStart;
std::mutex m_threadStartMutex;
std::map<int,std::thread::native_handle_type> m_threadNativeHandles;

char *my_strdup(const char *str) 
{
    size_t len = strlen(str);

    char *x = (char *)malloc(len+1); 

    if(x == nullptr) 
        return nullptr; 

    memcpy(x,str,len+1);

    return x;
}


void DoWork()
{
    char abc[50000];
    char *s1, *s2;

    std::strcpy(abc, "12345");
    std::strcpy(abc+20000, "12345");

    s1 = my_strdup(abc);
    s2 = my_strdup(abc);

    free(s1);
    free(s2);
}

void WorkerThread(int threadID)
{
    {
        std::unique_lock<std::mutex> lk(m_threadStartMutex);
        m_allowThreadStart.wait(lk, []{return m_startThreads;});
    }

    double transPerSec = 1 / 99999;
    int transactionCounter = 0;
    int64_t clockTicksUsed = 0;

    std::thread::native_handle_type handle = m_threadNativeHandles[threadID];

    std::chrono::high_resolution_clock::time_point current = std::chrono::high_resolution_clock::now();
    std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
    std::chrono::high_resolution_clock::time_point end = start + std::chrono::minutes(1);

    int random_num_loops = 0;
    double interarrivaltime = 0.0;
    double timeHolderReal = 0.0;
    while(current < end)
    {
            std::chrono::high_resolution_clock::time_point startWork = std::chrono::high_resolution_clock::now();

            for(int loopIndex = 0; loopIndex < 100; ++loopIndex)
            {
                for(int alwaysOneHundred = 0; alwaysOneHundred < 100; ++alwaysOneHundred)
                {
                    DoWork();
                }
            }

            std::chrono::high_resolution_clock::time_point endWork = std::chrono::high_resolution_clock::now();

            ++transactionCounter;
            clockTicksUsed += std::chrono::duration_cast<std::chrono::milliseconds>(endWork - startWork).count();

        current = std::chrono::high_resolution_clock::now();
    }

    std::lock_guard<std::mutex> tranMutex(m_totalTranMutex);
    std::cout << "Thread " << threadID << " finished with  " << transactionCounter << " transaction." << std::endl;
    m_totalTrans += transactionCounter;
}

int main(int argc, char *argv[])
{
    std::stringstream ss;

    int numthreads = atoi(argv[1]);

    std::list<std::thread> threads;

    int threadIds = 1;
    for(int i = 0; i < numthreads; ++i)
    {
        threads.push_back(std::thread(&WorkerThread, threadIds));
        m_threadNativeHandles.insert(std::make_pair(threadIds, threads.rbegin()->native_handle()));
        ++threadIds;
    }

    {
        std::lock_guard<std::mutex> lk(m_threadStartMutex);
        m_startThreads = true;
    }

    m_allowThreadStart.notify_all();

    //Join until completion
    for(std::thread &th : threads)
    {
        th.join();
    }


    ss << "TotalTran" << std::endl
       << m_totalTrans << std::endl;

    std::cout << ss.str();

}

这些数字看起来有点糟糕。我期望在这个系统上，从一个线程做X工作到两个线程做2X工作，更接近于加倍。这些线程确实完成了相同的工作量，但在一分钟内完成的工作量并没有那么多

当我搬到solaris的时候，它变得更奇怪了

在Solaris 11上，使用GCC 4.8.0，我按如下方式构建此程序：

gcc-o simpleThreads.cpp-I.-std=c++11-dsolais=1-lstdc++-lm

当我运行“/simple 1”时，我得到

对于“/simple 2”，我得到：

在Solaris上，双线程情况要慢得多。我不知道我做错了什么。我是c++11构造和线程的新手。所以这是双重打击。gcc-v显示线程模型是posix。任何帮助都将不胜感激。

您至少应该打开优化功能。

strcpy

和

memcpy

调用只复制了六个字符，因此该程序中唯一重要的工作就是调用

malloc

。从多个线程中重击

malloc

，并不能告诉您关于线程性能的很多信息。我确实启用了优化，并且不考虑工作的有用性。我仍然对结果感到困惑。如果我用2个线程运行程序，我会得到上面的结果，如果我用2个线程运行2个并行进程，而用2个线程运行1个进程。我得到了我期望的结果，每个进程执行大约20000个事务。还有其他malloc实现，如

tcmalloc

和

jemalloc

，它们对多线程应用程序的性能更好。问题的根源是malloc实现中的锁定。再加上Alex的评论，对于线程化应用程序来说，常规malloc库是一个糟糕的选择。例如，libmtmalloc专门用于多线程。您所要做的就是链接到该库。哦，阅读libmtmalloc的手册页，这里有大量的调优选项。

simplethread 1
Thread 1 finished with  1667 transaction.
TotalTran
1667

simplethread 2
Thread 1 finished with  1037 transaction.
Thread 2 finished with  1030 transaction.
TotalTran
2067

simplethread 3
Thread 3 finished with  824 transaction.
Thread 2 finished with  830 transaction.
Thread 1 finished with  837 transaction.
TotalTran
2491

simplethread 4
Thread 3 finished with  688 transaction.
Thread 2 finished with  693 transaction.
Thread 1 finished with  704 transaction.
Thread 4 finished with  691 transaction.
TotalTran
2776

simplethread 8
Thread 2 finished with  334 transaction.
Thread 6 finished with  325 transaction.
Thread 7 finished with  346 transaction.
Thread 1 finished with  329 transaction.
Thread 8 finished with  329 transaction.
Thread 3 finished with  338 transaction.
Thread 5 finished with  331 transaction.
Thread 4 finished with  330 transaction.
TotalTran
2662

E:\Development\Projects\Applications\CPUBenchmark\Debug>simplethread 16
Thread 16 finished with  163 transaction.
Thread 15 finished with  169 transaction.
Thread 12 finished with  165 transaction.
Thread 9 finished with  170 transaction.
Thread 10 finished with  166 transaction.
Thread 4 finished with  164 transaction.
Thread 13 finished with  166 transaction.
Thread 8 finished with  165 transaction.
Thread 6 finished with  165 transaction.
Thread 5 finished with  168 transaction.
Thread 2 finished with  161 transaction.
Thread 1 finished with  159 transaction.
Thread 7 finished with  160 transaction.
Thread 11 finished with  161 transaction.
Thread 14 finished with  163 transaction.
Thread 3 finished with  161 transaction.
TotalTran
2626

Thread 1 finished with  19686 transaction.
TotalTran
19686

Thread 1 finished with  5248 transaction.
Thread 2 finished with  2484 transaction.
TotalTran
7732