C++ C++；从文件中读取会放入三个奇怪的字符_C++_File Io_Byte Order Mark

C++ C++；从文件中读取会放入三个奇怪的字符

c++ file-io

C++ C++；从文件中读取会放入三个奇怪的字符,c++,file-io,byte-order-mark,C++,File Io,Byte Order Mark,当我一个字符串一个字符串地读取文件时，>>操作得到第一个字符串，但它以“i”？i开头。假设第一个字符串是“street”，那么它就变成了“istreet” 其他字符串都可以。我尝试了不同的txt文件。结果是一样的。第一个字符串以“i”？i开头。有什么问题这是我的密码： #include <iostream> #include <fstream> #include <string> #include <vector> using namespace

当我一个字符串一个字符串地读取文件时，>>操作得到第一个字符串，但它以“i”？i开头。假设第一个字符串是“street”，那么它就变成了“istreet”

其他字符串都可以。我尝试了不同的txt文件。结果是一样的。第一个字符串以“i”？i开头。有什么问题

这是我的密码：

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int cube(int x){ return (x*x*x);}

int main(){

int maxChar;
int lineLength=0;
int cost=0;

cout<<"Enter the max char per line... : ";
cin>>maxChar;
cout<<endl<<"Max char per line is : "<<maxChar<<endl;

fstream inFile("bla.txt",ios::in);

if (!inFile) {
    cerr << "Unable to open file datafile.txt";
    exit(1);   // call system to stop
}

while(!inFile.eof()) {
    string word;

    inFile >> word;
    cout<<word<<endl;
    cout<<word.length()<<endl;
    if(word.length()+lineLength<=maxChar){
        lineLength +=(word.length()+1);
    }
    else {
        cost+=cube(maxChar-(lineLength-1));
        lineLength=(word.length()+1);
    }   
}

}

#包括
#包括
#包括
#包括
使用名称空间std；
整数立方体（整数x）{return（x*x*x）；}
int main（）{
int-maxChar；
int lineLength=0；
整数成本=0；
库特麦克尔；
cout您将看到一个UTF-8。它是由创建该文件的应用程序添加的
要检测并忽略标记，可以尝试此（未测试）功能：
bool SkipBOM(std::istream & in)
{
    char test[4] = {0};
    in.read(test, 3);
    if (strcmp(test, "\xEF\xBB\xBF") == 0)
        return true;
    in.seekg(0);
    return false;
}

这里还有另外两个想法
如果您是创建文件的人，请将文件长度与文件一起保存，并且在读取文件时，只需通过以下简单计算剪切所有前缀：trueFileLength-savedFileLength=numobyestocut
保存文件时创建自己的前缀，阅读时搜索并删除以前找到的所有前缀
参考上面MarkRansom的优秀答案，添加此代码将跳过现有流上的BOM（字节顺序标记）。打开文件后调用它
// Skips the Byte Order Mark (BOM) that defines UTF-8 in some text files.
void SkipBOM(std::ifstream &in)
{
    char test[3] = {0};
    in.read(test, 3);
    if ((unsigned char)test[0] == 0xEF && 
        (unsigned char)test[1] == 0xBB && 
        (unsigned char)test[2] == 0xBF)
    {
        return;
    }
    in.seekg(0);
}

使用：
ifstream in(path);
SkipBOM(in);
string line;
while (getline(in, line))
{
    // Process lines of input here.
}

另外，您可能想指出，他读取的文件是错误的；它应该是while（infle>>word）
而不是while（！infle.eof（））
@vk7x除非在if语句之前添加“（unsigned char）”类型，例如if（（unsigned char）test[0]==0xEF&&（unsigned char）test[1]==0xBB&（unsigned char）test，否则上述代码将无法工作测试[2]==0xBF）。或者，或者与-17、-69和-65进行比较。请看下面我的答案。@Contango，我不知道为什么我花了这么长时间才看到你的评论，但谢谢。我想出了一个完全不同的方法来解决这个问题，请参阅我的最新编辑。旁白：永远不要使用.eof（）
作为循环条件。它几乎总是产生错误代码，就像在您的案例中一样。更喜欢在循环条件中执行输入操作：string word；while（infle>>word）{…}
。