C++ 在C+中使用std:：ifstream读取ASCII文本文件+；_C++_Character Encoding_Ifstream

C++ 在C+中使用std:：ifstream读取ASCII文本文件+；

c++ character-encoding

C++ 在C+中使用std:：ifstream读取ASCII文本文件+；,c++,character-encoding,ifstream,C++,Character Encoding,Ifstream,我有一个阿拉伯文文件（ASCII），其中包含： 121101 الزبون كمال 121102 الزبون سعيد 121103 我想用C++中的STD::IFFILE读这个文件：： std::ifstream ifs(file.GetFileName()); std::string content((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>()); std:

我有一个阿拉伯文文件（ASCII），其中包含： 121101 الزبون كمال 121102 الزبون سعيد 121103

我想用C++中的STD::IFFILE读这个文件：：

std::ifstream ifs(file.GetFileName());
std::string content((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());

std:：ifstream ifs（file.GetFileName（））；
std：：字符串内容（（std：：istreambuf_迭代器（ifs）），std：：istreambuf_迭代器（）；

当我使用VS IDE观看内容变量时，出现了一个字符编码错误： 121101 ÇáÒÈæä ßãÇá 121102 ÇáÒÈæä ÓÚíÏ 121103ÒÒæÚÒÒÒæÚÒÒÒ

我还有一个std:：wifstream：

std::wifstream ifs2(file.GetFileName());
std::string content2((std::istreambuf_iterator<wchar_t>(ifs2)), std::istreambuf_iterator<wchar_t>());

std:：wifstream ifs2（file.GetFileName（））；
std:：string content2（（std:：istreambuf_迭代器（ifs2）），std:：istreambuf_迭代器（）；

我也犯了同样的错误。有人能帮我吗？

谢谢。

为什么不改用FILE*呢？例如，这是从我的代码中摘录的，我正在读取我的游戏的save.ini，其中包含不同的save游戏条目。我喜欢fopen（）的地方在于，您实际上可以知道文件的格式（如UTF-8、UTF-16等）

FILE*pFini=fopen（“save\\save.ini”，“rt，ccs=UTF-8”）；
内线=0；
if（pFini==NULL）
{
库特
我有一个阿拉伯文文件（ASCII），其中包含：121101
121102㶈㶒㶒㶒㶒㶒㶒㶒㶒㶒㶈
经过一些澄清，OP希望：
编写读取uft8和ANSI文件的通用函数
为了能够以同样的方式处理内容，我建议将其转换为UTF-16编码的std:：wstring
。OP似乎是为Windows平台开发的，其中UTF-16是大多数API所期望的编码。在其他平台（Linux）上，可能更适合将所有内容转换为UTF-8
将ANSI文本文件读入UTF-16编码的wstring
为了能够解码ANSI（又称扩展ASCII），我们必须知道文件的名称
代码页（或者更准确地说是区域设置）可以通过流的方法定义
以下示例读取使用ANSI代码页1256编码的文本文件的内容，并使用MessageBoxW（）
显示文本，该文本需要UTF-16编码字符串：
#include <fstream>
#include <string>
#include <codecvt>
#include <Windows.h>

int main()
{
    // Use wifstream because we want to read content into a wstring.
    std::wifstream f{"test.txt"};

    // Define the code page of the text file (1256 = Arabic)
    f.imbue( std::locale( ".1256" ) );

    // Read the whole file into a wstring.
    // The stream converts from ANSI to UTF-16 encoding.
    std::wstring s{ std::istreambuf_iterator<wchar_t>( f ), std::istreambuf_iterator<wchar_t>() };

    // Display the string which is now UTF-16 encoded.    
    ::MessageBoxW( NULL, s.c_str(), L"test", 0 );

    return 0;
}

如果存在，则标志std:：consume_头将跳过
注意事项：

代码示例已在Windows 10下使用VS2017进行了测试，并进行了德国本地化
为了简洁起见，我省略了错误处理。在打开流和从流读取后，应该检查流状态

创建通用解决方案
上面提供的代码示例要求您事先知道文本文件的编码。以真正通用的方式检测文本文件的编码是一项艰巨的任务，因为没有标准的方法。这无法可靠地完成，您必须使用一些启发式方法
如果您可以对必须处理的文件做出一些假设，那么您可以编写一个简单的检测函数。假设这些文件仅属于以下类别：

带代码页1256的ANSI编码文件
带BOM的UTF-8编码文件（）

然后可以使用std:：ifstream
读取文件的前3个字节，并将它们与{0xEF，0xBB，0xBF}进行比较
。如果相等，您可以相对确定该文件是UTF-8编码的，因为非UTF-8编码的文件不太可能以这些字节开头。如果不相等，您将假定代码页1256。
ASCII没有阿拉伯字符。找出它实际的编码方式并以此方式读取，但是iostream
s不是很好因此，您可能需要使用特定于操作系统的函数或其他库来执行此操作。我知道ASCII有一个代码页参数，如何将此参数传递给ifstream！！您可以使用wifstream:：imbue（）来执行此操作<代码> >我的答案。问题是“代码> C++ + <代码> >代码> IFSturis/Cuff>，所以C解决方案没有帮助。PFFT，试图帮助我的知识。C的工作也适用于C++。个人来说，我使用这两个，为什么不。我的函数有时读取UTF8文件，有时读ASCII文件，所以我用过。ifstream和std:：string.Thank@zett42您的代码工作正常，但为什么我在尝试imbu方法时使用std:：ifstream ifs（“test.txt”）；ifs.imbue（std:：locale（.1256”）；std:：string content（（std:：istreambuf_迭代器（ifs）），std:：istreambuf_迭代器（））；它不工作？！！！！！！@Bassam如果您使用ifstream
而不是wifstream
则没有转换，因此imbue（）
什么都不做。你得到的字符串与文件中的编码相同。你想对内容做什么？@Bassam我在另一个答案下读到我的函数有时读取utf8文件，有时读取ASCII文件。如果你愿意，我可以添加一个示例，用wifstream读取UTF-8文件。我的目标是如果是一个读取uft8、ANSI、unicode文件并将整个文件内容放入内容变量的通用函数，是否可能，是否有方法确定文件编码？或者我必须为此编写多个函数？还请修改您的答案以包含UTF-8文件读取示例：）
#include <fstream>
#include <string>
#include <codecvt>
#include <Windows.h>

int main()
{
    // Use wifstream because we want to read content into a wstring.
    std::wifstream f{"test.txt"};

    // Define the code page of the text file (1256 = Arabic)
    f.imbue( std::locale( ".1256" ) );

    // Read the whole file into a wstring.
    // The stream converts from ANSI to UTF-16 encoding.
    std::wstring s{ std::istreambuf_iterator<wchar_t>( f ), std::istreambuf_iterator<wchar_t>() };

    // Display the string which is now UTF-16 encoded.    
    ::MessageBoxW( NULL, s.c_str(), L"test", 0 );

    return 0;
}

    f.imbue( std::locale( f.getloc(), 
        new std::codecvt_utf8_utf16< wchar_t, 1114111UL, std::consume_header> ) );