C 为什么消费者线程在生产者线程完成之前停止?
我最近编写了一个有界无锁队列,并对其进行了一些测试。在测试中,一些线程产生素数(从某个数字开始,按6倍的生产者线程对数计数,使用确定性Miller-Rabin测试检查每个数字,并将素数插入队列),一些线程使用素数(从队列中移除元素并检查它们是否为素数). 生产者线程成对出现,每对中的一个产生等于1模6的素数,另一个产生等于5模6的素数(除2和3外,所有等于0、2、3或4模6的数字都是复合的),主线程产生2和3。有一个全局计数器,指示有多少线程未完成生成。每次生产者线程或主线程完成生成素数时,它都会自动递减该计数器。使用者线程在不是0时循环 为了确定质数是否真正通过队列,我计算每个线程产生和消耗的质数的第0到第3个矩,并检查生产者线程的矩和是否等于消费者线程的矩和。第n个矩就是第n个幂的和,这意味着素数的个数,它们的和,它们的平方和,以及它们的立方体的和,都是匹配的。如果序列是彼此的置换,那么所有的矩都会匹配,因此,虽然我需要检查前n个长度为n的序列是否确实是置换,但前4个匹配意味着序列不匹配的可能性非常小 我的无锁队列实际上可以工作,但由于某种原因,当队列中仍然有元素时,使用者线程都会停止。我不明白为什么,因为生产者线程只有在将所有素数插入队列后才递减生产计数器,而生产计数器只能在所有生产线程递减后等于0。因此,每当生成计数器为0时,所有元素都已插入队列。但是,如果使用者尝试删除元素,则应该成功,因为只有当queue.full(队列中的元素数)为0时,删除才会失败。因此,当生产计数器为0时,使用者应该能够成功消费,直到queue.full为0,并且在队列耗尽之前不应该检查生产计数器并返回。它们仅在删除失败时检查生产计数器(以防使用者比生产者快并清空队列) 但是,当我在remove check queue.full上进行while循环时,除了生产计数器之外,消费者不会提前返回。也就是说,当我改变的时候C 为什么消费者线程在生产者线程完成之前停止?,c,multithreading,atomic,producer-consumer,lock-free,C,Multithreading,Atomic,Producer Consumer,Lock Free,我最近编写了一个有界无锁队列,并对其进行了一些测试。在测试中,一些线程产生素数(从某个数字开始,按6倍的生产者线程对数计数,使用确定性Miller-Rabin测试检查每个数字,并将素数插入队列),一些线程使用素数(从队列中移除元素并检查它们是否为素数). 生产者线程成对出现,每对中的一个产生等于1模6的素数,另一个产生等于5模6的素数(除2和3外,所有等于0、2、3或4模6的数字都是复合的),主线程产生2和3。有一个全局计数器,指示有多少线程未完成生成。每次生产者线程或主线程完成生成素数时,它
__atomic_load_n(&producing, __ATOMIC_SEQ_CST)
到
它只是工作。请注意,我的代码使用了合理数量的gcc扩展,如属性、_原子内置函数、_自动类型、语句表达式、128位整数、_内置函数ctzll和“\e”、C99特性,如指定的初始值设定项和复合文本以及pthread。我还使用顺序一致的内存顺序和强大的比较和到处交换,即使较弱的版本应该可以工作,因为我不希望出现问题,而我有这个问题。以下是标题队列。h:
#ifndef __QUEUE_H__
#define __QUEUE_H__
#include <stddef.h>
#include <inttypes.h>
typedef struct __attribute__((__designated_init__)){//using positional initializers for a struct is terrible
void *buf;
uint8_t *flags;//insert started, insert complete, remove started
size_t cap, full;
uint64_t a, b;
} queue_t;
typedef struct __attribute__((__designated_init__)){
size_t size;
} queue_ft;//this struct serves as a class for queue objects: any data specific to the object goes in the queue_t struct and any shared data goes here
int queue_insert(queue_t*, const queue_ft*, void *elem);
int queue_remove(queue_t*, const queue_ft*, void *out);
int queue_init(queue_t*, const queue_ft*, size_t reserve);
void queue_destroy(queue_t*, const queue_ft*);
#endif
为什么消费者线程在队列为空之前返回,即使它们等待生产者完成,但当它们在生产
上循环时,却在生产| | Q.full
上循环时执行正确的操作
为什么消费者线程在队列为空之前返回,即使它们等待生产者完成,但在生产时循环,而在生产时循环| | Q.full时执行正确的操作
因为没有更多的生产者意味着不会向队列中添加新条目;它不表示队列已为空
考虑生产商比消费者更快的情况。他们将自己的东西添加到队列中,然后退出。此时,队列中有项目,但活动生产者计数为零。如果消费者只检查是否有活动生产者,他们将错过队列中已经存在的项目
需要注意的是,支票
if ((active producers) || (items in queue))
这是C99中的正确版本。(在计算左侧之后,|
操作符有一个序列点。也就是说,在计算左侧之前,永远不会计算右侧。)
如果只检查活动生产者,您将错过生产者比消费者快的情况,并且在队列中仍有项目时退出
如果您只检查队列中的项目,您将错过生产者仍在向队列添加内容的情况
如果首先检查队列是否为空,则会打开一个竞赛窗口。在使用者检查队列是否为空之后,但在使用者检查是否有活动生产者之前,生产者可以向队列添加一个或多个项目并退出
您需要首先检查是否有活动的生产者。如果有活动生产者,且队列现在为空,则消费者必须等待队列中是否有新项目到达(直到活动生产者计数降至零,或队列中有新项目到达)。如果没有活动生产者,消费者必须检查队列中是否有项目。无活动生产者意味着队列中不会出现新项目,但并不意味着队列已为空
为什么消费者线程在队列为空之前返回,即使它们等待生产者完成,但在生产时循环,而在生产时循环| | Q.full时执行正确的操作
因为没有更多的生产者意味着不会向队列中添加新条目;它不表示队列已为空
考虑一下这个案例
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include "queue.h"
int queue_insert(queue_t *self, const queue_ft *ft, void *elem){
uint64_t i;
while(1){
uint8_t flag = 0;
if(__atomic_load_n(&self->full, __ATOMIC_SEQ_CST) == self->cap){
return 0;
}
i = __atomic_load_n(&self->b, __ATOMIC_SEQ_CST);
if(__atomic_compare_exchange_n(self->flags + i, &flag, 0x80, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)){//set the insert started flag if all flags are clear
break;
}
}
__atomic_fetch_add(&self->full, 1, __ATOMIC_SEQ_CST);
uint64_t b = i;
while(!__atomic_compare_exchange_n(&self->b, &b, (b + 1)%self->cap, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST));//increase the b endpoint of the queue with wraparaound
memcpy(self->buf + i*ft->size, elem, ft->size);//actually insert the item. accesses to the buffer mirror accesses to the flags so this is safe
__atomic_thread_fence(memory_order_seq_cst);
__atomic_store_n(self->flags + i, 0xc0, __ATOMIC_SEQ_CST);//set the insert completed flag
return 1;
}
int queue_remove(queue_t *self, const queue_ft *ft, void *out){
uint64_t i;
while(1){
uint8_t flag = 0xc0;
if(!__atomic_load_n(&self->full, __ATOMIC_SEQ_CST)){
return 0;
}
i = __atomic_load_n(&self->a, __ATOMIC_SEQ_CST);
if(__atomic_compare_exchange_n(self->flags + i, &flag, 0xe0, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)){//set the remove started flag if insert started and insert completed are set but the other flags are clear
break;
}
}
__atomic_fetch_sub(&self->full, 1, __ATOMIC_SEQ_CST);
uint64_t a = i;
while(!__atomic_compare_exchange_n(&self->a, &a, (a + 1)%self->cap, 0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST));//increase the a endpoint of the queue with wraparound
memcpy(out, self->buf + i*ft->size, ft->size);//actually remove the item.
__atomic_thread_fence(__ATOMIC_SEQ_CST);
__atomic_store_n(self->flags + i, 0x00, __ATOMIC_SEQ_CST);//clear all the flags to mark the remove as completed
return 1;
}
int queue_init(queue_t *self, const queue_ft *ft, size_t reserve){
void *buf = malloc(reserve*ft->size);
if(!buf){
return 0;
}
uint8_t *flags = calloc(reserve, sizeof(uint8_t));
if(!flags){
free(buf);
return 0;
}
*self = (queue_t){
.buf=buf,
.flags=flags,
.cap=reserve,
.full=0,
.a=0,.b=0
};
return 1;
}
void queue_destroy(queue_t *self, const queue_ft *ft){
free(self->buf);
free(self->flags);
}
#define _POSIX_C_SOURCE 201612UL
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <pthread.h>
#include <math.h>
#include <time.h>
#include "queue.h"
//Generate primes up to this number. Note 78498 is the number of primes below 1000000; this is hard coded because the queue does not support growing yet.
#define MAX 1000000
#define QUEUE_SIZE 78498
#define NUM_PRODUCER_PAIRS 3
#define NUM_CONSUMERS 2
//Every producer and consumer thread calculates the 0th through 3rd moments of the sequence of primes it sees, as well as testing them for primality.
//The nth moment is the sum of the nth powers, thus, the order does not matter and if the primes are the same in both the producers and the consumers
//then the sums of the moments will also be the same. I check that the 0th through 3rd moments match which means it is nearly certain the primes go through
//the queue.
#define NUM_MOMENTS 4
//Deterministic Miller Rabin witnesses (see https://en.wikipedia.org/wiki/Miller–Rabin_primality_test)
#define DMR_PRIMES (uint64_t[]){2, 13, 23, 1662803}
#define DMR_PRIMES_C 4
//Macro to split an integer into three parts. The first part has the 2**0, 2**3, 2**6, ..., 2**60 bits of the original and 0 elsewhere.
//The second part has the 2**1, 2**4, 2**7, ..., 2**61 bits of the original and 0 elsewhere. The last part has the 2**2, ..., 2**62 bits.
//The 2**63 bit is lost. The masks represent the sums of geometric sequences. The original number can be obtained by bitwise or or xor on the parts.
//I spread the uint64_t's (which are unsigned long longs) over 3 uint64_t's so that they take up 24 bytes and memcpy'ing them happens in multiple steps.
//This spreading is only done on primes that have been produced before they are put into the queue. The consumers then recombine and verify them.
#define SPREAD_EMPLACE(n) ({__auto_type _n = (n); &(spread_integer){(_n)&(((1ULL<<60)-1)/7), (_n)&(((1ULL<<61)-2)/7), (_n)&(((1ULL<<62)-4)/7)};})
typedef struct{
uint64_t x, y, z;
} spread_integer;
queue_ft spread_integer_ft = {.size= sizeof(spread_integer)};
queue_t Q;
//Start producing count at 1 + (NUM_PRODUCING_THREADS << 1) because main generates 2 and 3 and reduce it by 1 every time a producer thread finishes
int producing = 1 + (NUM_PRODUCER_PAIRS << 1);
//Uses the binary algorithm for modular exponentiation (https://en.wikipedia.org/wiki/Exponentiation_by_squaring)
//It is a helper function for isPrime
uint64_t powmod(unsigned __int128 b, uint64_t e, uint64_t n){
unsigned __int128 r = 1;
b %= n;
while(e){
if(e&1){
r = r*b%n;
}
e >>= 1;
b = b*b%n;
}
return (uint64_t)r;
}
//uses deterministic Miller Rabin primality test
int isPrime(uint64_t n){
uint64_t s, d;//s, d | 2^s*d = n - 1
if(n%2 == 0){
return n == 2;
}
--n;
s = __builtin_ctzll(n);
d = n>>s;
++n;
for(uint64_t i = 0, a, x; i < DMR_PRIMES_C; ++i){
a = DMR_PRIMES[i];
if(a >= n){
break;
}
x = powmod(a, d, n);
if(x == 1 || x == n - 1){
goto CONTINUE_WITNESSLOOP;
}
for(a = 0; a < s - 1; ++a){
x = powmod(x, 2, n);
if(x == 1){
return 0;
}
if(x == n - 1){
goto CONTINUE_WITNESSLOOP;
}
}
return 0;
CONTINUE_WITNESSLOOP:;
}
return 1;
}
void *produce(void *_moments){
uint64_t *moments = _moments, n = *moments;//the output argument for the 0th moment serves as the input argument for the number to start checking for primes at
*moments = 0;
for(; n < MAX; n += 6*NUM_PRODUCER_PAIRS){//the producers are paired so one of every pair generates primes equal to 1 mod 6 and the other equal to 5 mod 6. main generates 2 and 3 the only exceptions
if(isPrime(n)){
for(uint64_t m = 1, i = 0; i < NUM_MOMENTS; m *= n, ++i){
moments[i] += m;
}
if(!queue_insert(&Q, &spread_integer_ft, SPREAD_EMPLACE(n))){
fprintf(stderr, "\e[1;31mERROR: Could not insert into queue.\e[0m\n");
exit(EXIT_FAILURE);
}
}
}
__atomic_fetch_sub(&producing, 1, __ATOMIC_SEQ_CST);//this thread is done generating primes; reduce producing counter by 1
return moments;
}
void *consume(void *_moments){
uint64_t *moments = _moments;
while(__atomic_load_n(&producing, __ATOMIC_SEQ_CST) || __atomic_load_n(&Q.full, __ATOMIC_SEQ_CST)){//busy loop while some threads are producing
spread_integer xyz;
if(queue_remove(&Q, &spread_integer_ft, &xyz)){
uint64_t n = xyz.x | xyz.y | xyz.z;
if(isPrime(n)){
for(uint64_t m = 1, i = 0; i < NUM_MOMENTS; m *= n, ++i){
moments[i] += m;
}
}else{
fprintf(stderr, "\e[1;31mERROR: Generated a prime that fails deterministic Miller Rabin.\e[0m\n");
exit(EXIT_FAILURE);
}
}
}
return moments;
}
int main(void){
if(!queue_init(&Q, &spread_integer_ft, QUEUE_SIZE)){
fprintf(stderr, "\e[1;31mERROR: Could not initialize queue.\e[0m\n");
exit(EXIT_FAILURE);
}
pthread_t producers[NUM_PRODUCER_PAIRS << 1], consumers[NUM_CONSUMERS];
uint64_t moments[(NUM_PRODUCER_PAIRS << 1) + 1 + NUM_CONSUMERS + 1][NUM_MOMENTS] = {};//the 2 extras are because main produces the primes 2 and 3 and consumes primes the consumers leave behind
for(size_t i = 0; i < NUM_CONSUMERS; ++i){//create consumers first to increase likelihood of causing bugs
if(pthread_create(consumers + i, NULL, consume, moments[(NUM_PRODUCER_PAIRS << 1) + 1 + i])){
fprintf(stderr, "\e[1;31mERROR: Could not create consumer thread.\e[0m\n");
exit(EXIT_FAILURE);
}
}
for(size_t i = 0; i < NUM_PRODUCER_PAIRS; ++i){
moments[i << 1][0] = 5 + 6*i;
if(pthread_create(producers + (i << 1), NULL, produce, moments[i << 1])){
fprintf(stderr, "\e[1;31mERROR: Could not create producer thread.\e[0m\n");
exit(EXIT_FAILURE);
}
moments[(i << 1) + 1][0] = 7 + 6*i;
if(pthread_create(producers + (i << 1) + 1, NULL, produce, moments[(i << 1) + 1])){
fprintf(stderr, "\e[1;31mERROR: Could not create producer thread.\e[0m\n");
exit(EXIT_FAILURE);
}
}
for(uint64_t n = 2; n < 4; ++n){
for(uint64_t m = 1, i = 0; i < NUM_MOMENTS; m *= n, ++i){
moments[NUM_PRODUCER_PAIRS << 1][i] += m;
}
if(!queue_insert(&Q, &spread_integer_ft, SPREAD_EMPLACE(n))){
fprintf(stderr, "\e[1;31mERROR: Could not insert into queue.\e[0m\n");
exit(EXIT_FAILURE);
}
}
__atomic_fetch_sub(&producing, 1, __ATOMIC_SEQ_CST);
uint64_t c = 0;
for(size_t i = 0; i < NUM_CONSUMERS; ++i){//join consumers first to bait bugs. Note consumers should not finish until the producing counter reaches 0
void *_c;
if(pthread_join(consumers[i], &_c)){
fprintf(stderr, "\e[1;31mERROR: Could not join consumer thread.\e[0m\n");
exit(EXIT_FAILURE);
}
c += (uintptr_t)_c;
}
for(size_t i = 0; i < NUM_PRODUCER_PAIRS << 1; ++i){
if(pthread_join(producers[i], NULL)){
fprintf(stderr, "\e[1;31mERROR: Could not join producer thread.\e[0m\n");
exit(EXIT_FAILURE);
}
}
//this really should not be happening because the consumer threads only return after the producing counter reaches 0,
//which only happens after all of the producer threads are done inserting items into the queue.
if(Q.full){
fprintf(stdout, "\e[1;31mWTF: Q.full != 0\nproducing == %d\e[0m\n", producing);
}
while(Q.full){
spread_integer xyz;
if(!queue_remove(&Q, &spread_integer_ft, &xyz)){
fprintf(stderr, "\e[1;31mERROR: Could not remove from non empty queue.\e[0m\n");
exit(EXIT_FAILURE);
}
uint64_t n = xyz.x | xyz.y | xyz.z;
if(isPrime(n)){
for(uint64_t m = 1, i = 0; i < NUM_MOMENTS; m *= n, ++i){
moments[(NUM_PRODUCER_PAIRS << 1) + 1 + NUM_CONSUMERS][i] += m;
}
}else{
fprintf(stderr, "\e[1;31mERROR: Generated a prime that fails deterministic Miller Rabin.\e[0m\n");
exit(EXIT_FAILURE);
}
}
queue_destroy(&Q, &spread_integer_ft);
for(uint64_t i = 0, p, c, j; i < NUM_MOMENTS; ++i){
for(j = p = 0; j < (NUM_PRODUCER_PAIRS << 1) + 1; ++j){
p += moments[j][i];
}
for(c = 0; j < (NUM_PRODUCER_PAIRS << 1) + 1 + NUM_CONSUMERS + 1; ++j){
c += moments[j][i];
}
printf("Moment %"PRIu64" %"PRIu64" -> %"PRIu64"\n", i, p, c);
}
}
gcc -o test_queue_pc queue.c test_queue_pc.c -Wall -std=c99 -g -O0 -pthread -fuse-ld=gold -flto -lm
if ((active producers) || (items in queue))