C++ std:：map insert/erase的并发性问题_C++_Concurrency_Pthreads

C++ std:：map insert/erase的并发性问题

c++ concurrency

C++ std:：map insert/erase的并发性问题,c++,concurrency,pthreads,C++,Concurrency,Pthreads,我正在编写一个线程应用程序，它将处理一个资源列表，并且可能会也可能不会将结果项放入每个资源的容器（std:：map）中。资源的处理在多个线程中进行结果容器将被遍历，每个项目由一个单独的线程处理，该线程获取一个项目并更新一个MySQL数据库（使用MySQLCPCONN API），然后从容器中删除该项目并继续为简单起见，以下是逻辑概述： queueWorker() - thread getResourcesList() - seeds the global queue databas

我正在编写一个线程应用程序，它将处理一个资源列表，并且可能会也可能不会将结果项放入每个资源的容器（std:：map）中。资源的处理在多个线程中进行

结果容器将被遍历，每个项目由一个单独的线程处理，该线程获取一个项目并更新一个MySQL数据库（使用MySQLCPCONN API），然后从容器中删除该项目并继续

为简单起见，以下是逻辑概述：

queueWorker() - thread
    getResourcesList() - seeds the global queue

databaseWorker() - thread
    commitProcessedResources() - commits results to a database every n seconds

processResources() - thread x <# of processor cores>
    processResource()
    queueResultItem()

queueWorker（）-线程
getResourcesList（）-为全局队列种子
databaseWorker（）-线程
commitProcessedResources（）-每n秒向数据库提交一次结果
processResources（）-线程x
processResource（）
queueResultItem（）

和伪实现来显示我在做什么

/* not the actual stucts, but just for simplicities sake */
struct queue_item_t {
    int id;
    string hash;
    string text;
};

struct result_item_t {
    string hash; // hexadecimal sha1 digest
    int state;
}

std::map< string, queue_item_t > queue;
std::map< string, result_item_t > results;

bool processResource (queue_item_t *item)
{
    result_item_t result;

    if (some_stuff_that_doesnt_apply_to_all_resources)
    {
        result.hash = item->hash;
        result.state = 1;

        /* PROBLEM IS HERE */
        queueResultItem(result);
    }
}

void commitProcessedResources ()
{
    pthread_mutex_lock(&resultQueueMutex);

    // this can take a while since there

    for (std::map< string, result_item_t >::iterator it = results.begin; it != results.end();)
    {
        // do mysql stuff that takes a while

        results.erase(it++);
    }

    pthread_mutex_unlock(&resultQueueMutex);
}

void queueResultItem (result_item_t result)
{
    pthread_mutex_lock(&resultQueueMutex);

    results.insert(make_pair(result.hash, result));

    pthread_mutex_unlock(&resultQueueMutex);
}

/*不是实际的结构，只是为了简单起见*/
结构队列\u项目\u t{
int-id；
字符串散列；
字符串文本；
};
结构结果\u项\u t{
字符串哈希；//十六进制sha1摘要
int状态；
}
std:：map队列；
标准：：映射结果；
bool processResource（队列项目）
{
结果项目结果；
如果（一些不适用于所有资源的东西）
{
result.hash=项目->哈希；
result.state=1；
/*问题就在这里*/
queueResultItem（结果）；
}
}
无效提交流程资源（）
{
pthread_mutex_lock（&resultquemutex）；
//这可能需要一段时间，因为
对于（std:：map：：迭代器it=results.begin；it！=results.end（）；）
{
//做一些需要一段时间的事情
结果：擦除（it++）；
}
pthread_mutex_unlock（&resultquemutex）；
}
无效队列结果项（结果项结果）
{
pthread_mutex_lock（&resultquemutex）；
results.insert（make_pair（result.hash，result））；
pthread_mutex_unlock（&resultquemutex）；
}

正如processResource（）中所指出的，问题在于，当commitProcessedResources（）正在运行且resultQueueMutex被锁定时，我们将在此处等待queueResultItem（）返回，因为它将尝试锁定同一个互斥对象，因此将等待直到完成，这可能需要一段时间

显然，由于运行的线程数量有限，一旦所有线程都在等待queueResultItem（）完成，在释放互斥对象并使其可用于queueResultItem（）之前，将不再进行任何工作

所以，我的问题是如何最好地实施这一点？是否有一种特定的标准容器可以同时插入和删除，或者是否存在我不知道的东西？

严格来说，每个队列项都可以有自己的唯一键并不是必需的，就像这里的std:：map一样，但我更喜欢它，因为多个资源可以产生相同的结果，我更喜欢只向数据库发送唯一的结果，即使它使用INSERT IGNORE忽略任何重复项

我对C++很陌生，所以我不知道在谷歌上寻找什么，不幸的是：（

在

CommitProcess Resources（）

中处理期间，您不必一直持有队列的锁。您可以将队列替换为空队列：

void commitProcessedResources ()
{
    std::map< string, result_item_t > queue2;
    pthread_mutex_lock(&resultQueueMutex);
    // XXX Do a quick swap.
    queue2.swap (results);
    pthread_mutex_unlock(&resultQueueMutex);

    // this can take a while since there

    for (std::map< string, result_item_t >::iterator it = queue2.begin();
        it != queue2.end();)
    {
        // do mysql stuff that takes a while

        // XXX You do not need this.
        //results.erase(it++);
    }   
}

void commitProcessedResources（）
{
std:：mapqueue2；
pthread_mutex_lock（&resultquemutex）；
//XXX进行快速交换。
队列2.交换（结果）；
pthread_mutex_unlock（&resultquemutex）；
//这可能需要一段时间，因为
对于（std:：map：：迭代器it=queue2.begin（）；
it！=queue2.end（）；）
{
//做一些需要一段时间的事情
//你不需要这个。
//结果：擦除（it++）；
}   
}

您需要使用同步方法（即互斥）使其正常工作。然而，并行编程的目标是最小化关键部分（即在您持有锁时执行的代码量）

也就是说，如果您的MySQL查询可以在没有同步的情况下并行运行（即多个调用不会相互冲突），那么将它们从关键部分中删除。这将大大减少开销。例如，下面的简单重构就可以做到这一点

void commitProcessedResources ()
{
    // MOVING THIS LOCK

    // this can take a while since there
    pthread_mutex_lock(&resultQueueMutex);
    std::map<string, result_item_t>::iterator end = results.end();
    std::map<string, result_item_t>::iterator begin = results.begin();
    pthread_mutex_unlock(&resultQueueMutex);

    for (std::map< string, result_item_t >::iterator it = begin; it != end;)
    {
        // do mysql stuff that takes a while

        pthread_mutex_lock(&resultQueueMutex); // Is this the only place we need it?
        // This is a MUCH smaller critical section
        results.erase(it++);
        pthread_mutex_unlock(&resultQueueMutex); // Unlock or everything will block until end of loop
    }

    // MOVED UNLOCK
}

void commitProcessedResources（）
{
//移动这把锁
//这可能需要一段时间，因为
pthread_mutex_lock（&resultquemutex）；
std:：map:：iterator end=results.end（）；
std:：map:：iterator begin=results.begin（）；
pthread_mutex_unlock（&resultquemutex）；
对于（std:：map：：迭代器it=begin；it！=end；）
{
//做一些需要一段时间的事情
pthread_mutex_lock（&resultquemutex）；//这是我们唯一需要它的地方吗？
//这是一个小得多的关键部分
结果：擦除（it++）；
pthread_mutex_unlock（&resultquemutex）；//解锁，否则所有内容都将阻塞，直到循环结束
}
//移动解锁
}

这将允许您跨多个线程并发“实时”访问数据。也就是说，每次写入完成后，映射都会更新，并且可以在其他地方读取当前信息。

直到C++03，该标准根本没有定义任何关于线程或线程安全的内容（既然您使用的是

pthread

s，我想这就是您所使用的）

因此，您需要锁定共享映射，以确保在任何给定时间只有一个线程尝试访问该映射。否则，您可能会损坏其内部数据结构，因此该映射将不再有效

或者（我通常更喜欢这样），您可以让多线程将其数据放入线程安全队列，然后让一个线程从该队列获取数据并将其放入映射。因为它是单线程的，所以您不再需要