C++ 并发追加到文件：写入丢失_C++_File_Append

C++ 并发追加到文件：写入丢失

c++ file

C++ 并发追加到文件：写入丢失,c++,file,append,C++,File,Append,文件支持功能是否具有并发附件我对每个线程使用并发线程+fstream进行了测试。我发现数据没有损坏，但有些写操作丢失了。写入完成后，文件大小小于预期值。写入不重叠如果我在每个fstream中使用自定义寻道进行写入，我将协调每个线程将写入的偏移量，则不会丢失任何写入以下是示例代码： #include <fstream> #include <vector> #include <thread> #include "gtest/gtest.h&quo

文件支持功能是否具有并发附件

我对每个线程使用并发线程+fstream进行了测试。我发现数据没有损坏，但有些写操作丢失了。写入完成后，文件大小小于预期值。写入不重叠

如果我在每个fstream中使用自定义寻道进行写入，我将协调每个线程将写入的偏移量，则不会丢失任何写入

以下是示例代码：

#include <fstream>
#include <vector>
#include <thread>
#include "gtest/gtest.h"

void append_concurrently(string filename, const int data_in_gb, const int num_threads, const char start_char,
    bool stream_cache = true) {
    const int offset = 1024;
    const long long num_records_each_thread = (data_in_gb * 1024 * ((1024 * 1024) / (num_threads * offset)));

    {
        auto write_file_fn = [&](int index) {
            // each thread has its own handle
            fstream file_handle(filename, fstream::app | fstream::binary);
            if (!stream_cache) {
                file_handle.rdbuf()->pubsetbuf(nullptr, 0); // no bufferring in fstream
            }

            vector<char> data(offset, (char)(index + start_char));

            for (long long i = 0; i < num_records_each_thread; ++i) {
                file_handle.write(data.data(), offset);
                if (!file_handle) {
                    std::cout << "File write failed: "
                        << file_handle.fail() << " " << file_handle.bad() << " " << file_handle.eof() << std::endl;
                    break;
                }
            }

            // file_handle.flush();
        };

        auto start_time = chrono::high_resolution_clock::now();
        vector<thread> writer_threads;
        for (int i = 0; i < num_threads; ++i) {
            writer_threads.push_back(std::thread(write_file_fn, i));
        }

        for (int i = 0; i < num_threads; ++i) {
            writer_threads[i].join();
        }

        auto end_time = chrono::high_resolution_clock::now();

        std::cout << filename << " Data written : " << data_in_gb << " GB, " << num_threads << " threads "
            << ", cache " << (stream_cache ? "true " : "false ") << ", size " << offset << " bytes ";
        std::cout << "Time taken: " << (end_time - start_time).count() / 1000 << " micro-secs" << std::endl;
    }

    {
        ifstream file(filename, fstream::in | fstream::binary);
        file.seekg(0, ios_base::end);
        
        // This EXPECT_EQ FAILS as file size is smaller than EXPECTED
        EXPECT_EQ(num_records_each_thread * num_threads * offset, file.tellg());
        file.seekg(0, ios_base::beg);
        EXPECT_TRUE(file);

        char data[offset]{ 0 };
        for (long long i = 0; i < (num_records_each_thread * num_threads); ++i) {
            file.read(data, offset);
            EXPECT_TRUE(file || file.eof()); // should be able to read until eof
            char expected_char = data[0]; // should not have any interleaving of data.

            bool same = true;
            for (auto & c : data) {
                same = same && (c == expected_char) && (c != 0);
            }

            EXPECT_TRUE(same); // THIS PASSES
            if (!same) {
                std::cout << "corruption detected !!!" << std::endl;
                break;
            }

            if (file.eof()) { // THIS FAILS as file size is smaller
                EXPECT_EQ(num_records_each_thread * num_threads, i + 1);
                break;
            }
        }
    }
}

TEST(fstream, file_concurrent_appends) {
    string filename = "file6.log";
    const int data_in_gb = 1;
    
    {
        // trunc file before write threads start.
        {
            fstream file(filename, fstream::in | fstream::out | fstream::trunc | fstream::binary);
        }
        append_concurrently(filename, data_in_gb, 4, 'B', false);
    }
    std::remove(filename.c_str());
}

4KB缓冲区大小不会重现此问题

Running main() from gtest_main.cc
Note: Google Test filter = *file_conc*_*append*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from fstream
[ RUN      ] fstream.file_concurrent_appends
file6.log Data written : 1 GB, 1 threads , cache true , size 512 bytes Time taken: 38069289 micro-secs
d:\projs\logpoc\tests\test.cpp(279): error:       Expected: num_records_each_thread * num_threads * offset
      Which is: 1073741824
To be equal to: file.tellg()
      Which is: 1073737728
d:\projs\logpoc\tests\test.cpp(301): error:       Expected: num_records_each_thread * num_threads
      Which is: 2097152
To be equal to: i + 1
      Which is: 2097145

编辑2：连接所有线程后关闭

文件\u句柄

，以刷新内部缓冲区中的数据。这解决了上述问题。

根据，std:：fstream提供的函数通常是线程安全的

但是，如果每个线程都有自己的代码> STD::FStuts对象，那么，就C++标准库而言，这些是不同的流，不会发生同步。只有操作系统的才会知道所有文件句柄都指向同一个文件。因此，任何同步都必须由内核完成。但是内核可能甚至没有意识到写入应该到达文件的末尾。根据您的平台，内核可能只接收特定文件位置的写入请求。如果文件的结尾同时被另一个线程的附加移动，那么线程先前写请求的位置可能不再是文件的结尾

根据上的文档，在追加模式下打开文件将导致流在每次写入之前搜索到文件的末尾。这种行为似乎正是你想要的。但是，出于上述原因，这可能仅在所有线程共享相同的

std:：fstream

对象时才起作用。在这种情况下，

std:：fstream

对象应该能够同步所有写入。特别是，它应该能够以原子方式执行到文件末尾的寻道和后续写入。

@Andreas Wenzel不支持使用相同fstream的多线程，因为它内部有一个偏移位置。我以附加模式打开了文件。所以，我希望操作系统会注意到补偿问题。。我试着分享fstream。在512缓冲区大小的情况下，我看到8次写操作（总计4 KB写操作）始终丢失。@AshishNegi:我怀疑您的函数调用

file_handle.rdbuf（）->pubsetbuf（nullptr，0）可能失败。根据，必须在打开文件之前调用该函数。由于您似乎没有刷新输出缓冲区，我怀疑丢失的4096字节仍在缓冲区中。@AshishNegi：经过进一步研究，我发现根据，gcc编译器要求在打开文件之前调用函数，但其他编译器也允许您在打开文件后调用该函数。不过，我不确定那一页是否仍然是最新的。情况似乎没有改变。对于libstdc++（由gcc使用），setbuf
或pubstebuf必须在打开任何文件之前调用。否则，函数调用无效。有关更多信息，请参阅。关于不刷新/关闭用于写入的早期文件句柄，您是对的。谢谢这就解开了谜团。。
Running main() from gtest_main.cc
Note: Google Test filter = *file_conc*_*append*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from fstream
[ RUN      ] fstream.file_concurrent_appends
file6.log Data written : 1 GB, 1 threads , cache true , size 512 bytes Time taken: 38069289 micro-secs
d:\projs\logpoc\tests\test.cpp(279): error:       Expected: num_records_each_thread * num_threads * offset
      Which is: 1073741824
To be equal to: file.tellg()
      Which is: 1073737728
d:\projs\logpoc\tests\test.cpp(301): error:       Expected: num_records_each_thread * num_threads
      Which is: 2097152
To be equal to: i + 1
      Which is: 2097145