C++；原子学与交叉线可见性 < C++ >原子（ >家庭提供3个好处：基本指令不可分割（无脏读）内存顺序（对于CPU和编译器）和跨线程可见性/更改传播_C++_Multithreading_C++11_Atomicity

C++；原子学与交叉线可见性 < C++ >原子（ >家庭提供3个好处：基本指令不可分割（无脏读）内存顺序（对于CPU和编译器）和跨线程可见性/更改传播

c++ multithreading c++11

C++；原子学与交叉线可见性 < C++ >原子（ >家庭提供3个好处：基本指令不可分割（无脏读）内存顺序（对于CPU和编译器）和跨线程可见性/更改传播,c++,multithreading,c++11,atomicity,C++,Multithreading,C++11,Atomicity,我不确定第三个子弹，因此看一下下面的例子 #include <atomic> std::atomic_bool a_flag = ATOMIC_VAR_INIT(false); struct Data { int x; long long y; char const* z; } data; void thread0() { // due to "release" the data will be written to memory // e

我不确定第三个子弹，因此看一下下面的例子

#include <atomic>

std::atomic_bool a_flag = ATOMIC_VAR_INIT(false);
struct Data {
    int x;
    long long y;
    char const* z;
} data;

void thread0()
{
    // due to "release" the data will be written to memory
    // exactly in the following order: x -> y -> z
    data.x = 1;
    data.y = 100;
    data.z = "foo";
    // there can be an arbitrary delay between the write 
    // to any of the members and it's visibility in other 
    // threads (which don't synchronize explicitly)

    // atomic_bool guarantees that the write to the "a_flag"
    // will be clean, thus no other thread will ever read some
    // strange mixture of 4bit + 4bits
    a_flag.store(true, std::memory_order_release);
}

void thread1()
{
    while (a_flag.load(std::memory_order_acquire) == false) {};
    // "acquire" on a "released" atomic guarantees that all the writes from 
    // thread0 (thus data members modification) will be visible here
}

void thread2()
{
    while (data.y != 100) {};
    // not "acquiring" the "a_flag" doesn't guarantee that will see all the 
    // memory writes, but when I see the z == 100 I know I can assume that 
    // prior writes have been done due to "release ordering" => assert(x == 1)
}

int main()
{
    thread0(); // concurrently
    thread1(); // concurrently
    thread2(); // concurrently

    // join

    return 0;
}

#包括
std:：atomic_bool a_flag=atomic_VAR_INIT（false）；
结构数据{
int x；
长y；
字符常量*z；
}数据；
void thread0（）
{
//由于“释放”，数据将写入内存
//完全按照以下顺序：x->y->z
数据x=1；
数据y=100；
data.z=“foo”；
//写入之间可能存在任意延迟
//对于任何成员，以及它在其他
//线程（不显式同步）
//atomic_bool保证写入“a_标志”
//将是干净的，因此没有其他线程会读取一些
//4位+4位的奇怪混合
存储（true，std:：memory\u order\u release）；
}
void thread1（）
{
while（a_flag.load（std:：memory_order_acquire）==false）{}；
//在“已发布”原子上的“获取”保证所有写入
//thread0（因此数据成员修改）将在此处可见
}
无效线程2（）
{
而（data.y！=100）{}；
//不“获取”a_标志并不保证会看到所有
//内存写入，但当我看到z==100时，我知道我可以假设
//由于“发布顺序”=>assert（x==1），以前的写操作已经完成
}
int main（）
{
thread0（）；//并发
thread1（）；//并发
thread2（）；//并发
//加入
返回0；
}

首先，请用代码验证我的假设（尤其是

thread2

）

第二，我的问题是：

a_标志

write如何传播到其他内核

写入程序缓存中的

std:：atomic

是否将写入程序缓存中的

a_标志

与其他内核缓存同步（使用MESI或任何其他方法），或者传播是自动的

假设在特定的机器上，对标志的写入是原子的（想想x86上的int_32），并且我们没有任何私有内存来同步（我们只有一个标志），我们需要使用原子吗

考虑到最流行的CPU体系结构（x86、x64、ARM v、IA-64），跨核可见性（我现在不考虑重新排序）是自动的（但可能会延迟），还是需要发出特定命令来传播任何数据

核心本身并不重要。问题是“所有内核最终如何看到相同的内存更新”，这是您的硬件为您所做的事情（例如缓存一致性协议）。只有一个内存，所以主要关注的是缓存，这是硬件的私有关注点

这个问题似乎不清楚。重要的是加载和存储

a_flag

形成的获取-释放对，它是一个同步点，导致

thread0

和

thread1

的效果以特定顺序出现（即，存储之前

thread0

中的所有内容发生在

thread1

中循环之后的所有内容之前）

是的，否则就没有同步点了

你不需要任何“命令”在C++中，C++甚至不知道它在任何特定类型的CPU上运行的事实。你可以在足够的想象力下在魔方上运行C++程序。C++编译器选择必要的指令来实现由C++内存模型描述的同步行为，以及在x86上发布指令。n锁定前缀和内存围栏，以及不要对指令进行太多的重新排序。由于x86有一个强有序的内存模型，与没有原子的简单、不正确的代码相比，上面的代码应该产生最少的额外代码

将

thread2

放在代码中会导致整个程序的行为未定义

只是为了好玩，并且为了说明了解自己正在发生的事情是有启发性的，我将代码编译为三种变体（我添加了一个glbbal

intx

，并在

thread1

中添加了

x=data.y；

）

获取/发布：（您的代码）

顺序一致：（删除显式排序）

“天真”：（只需使用

bool

）

正如您所看到的，没有太大区别。“不正确”版本实际上看起来基本正确，只是缺少加载（它使用

cmp

和内存操作数）.顺序一致的版本在

xggh

指令中隐藏了它的开销，该指令具有隐式锁前缀，并且似乎不需要任何显式围栏

thread0:
    mov DWORD PTR data, 1
    mov DWORD PTR data+4, 100
    mov DWORD PTR data+8, 0
    mov DWORD PTR data+12, OFFSET FLAT:.LC0
    mov BYTE PTR a_flag, 1
    ret

thread1:
.L14:
    movzx   eax, BYTE PTR a_flag
    test    al, al
    je  .L14
    mov eax, DWORD PTR data+4
    mov DWORD PTR x, eax
    ret

thread0:
    mov eax, 1
    mov DWORD PTR data, 1
    mov DWORD PTR data+4, 100
    mov DWORD PTR data+8, 0
    mov DWORD PTR data+12, OFFSET FLAT:.LC0
    xchg    al, BYTE PTR a_flag
    ret

thread1:
.L14:
    movzx   eax, BYTE PTR a_flag
    test    al, al
    je  .L14
    mov eax, DWORD PTR data+4
    mov DWORD PTR x, eax
    ret

thread0:
    mov DWORD PTR data, 1
    mov DWORD PTR data+4, 100
    mov DWORD PTR data+8, 0
    mov DWORD PTR data+12, OFFSET FLAT:.LC0
    mov BYTE PTR a_flag, 1
    ret

thread1:
    cmp BYTE PTR a_flag, 0
    jne .L3
.L4:
    jmp .L4
.L3:
    mov eax, DWORD PTR data+4
    mov DWORD PTR x, eax
    ret