Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/cplusplus/157.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C++ 当从文本文件C+;将单词多映射到行时,为什么会得到额外的索引值+;?_C++_Indexing_Multimap - Fatal编程技术网

C++ 当从文本文件C+;将单词多映射到行时,为什么会得到额外的索引值+;?

C++ 当从文本文件C+;将单词多映射到行时,为什么会得到额外的索引值+;?,c++,indexing,multimap,C++,Indexing,Multimap,我正在开发一个多重映射程序,它接收一个文本文件,删除puncuation,然后根据每个单词出现在哪一行创建一个索引。代码编译并运行,但我得到了我不想要的输出。我很确定问题在于标点符号。每次单词后面跟一个句点字符时,它都会将该单词计数两次,即使我排除了puncuation。然后它将最后一个单词打印几次,表示它存在于文件中不存在的行中。我们将非常感谢您的帮助 输入文件: dogs run fast. dogs bark loud. cats sleep hard. cats are not dogs

我正在开发一个多重映射程序,它接收一个文本文件,删除puncuation,然后根据每个单词出现在哪一行创建一个索引。代码编译并运行,但我得到了我不想要的输出。我很确定问题在于标点符号。每次单词后面跟一个句点字符时,它都会将该单词计数两次,即使我排除了puncuation。然后它将最后一个单词打印几次,表示它存在于文件中不存在的行中。我们将非常感谢您的帮助

输入文件:

dogs run fast.
dogs bark loud.
cats sleep hard.
cats are not dogs.
Thank you.
#
C++代码:

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <map>

using namespace std;

int main(){

    ifstream input;
    input.open("NewFile.txt");
    if ( !input )
    {
        cout << "Error opening file." << endl;
        return 0;
    }

    multimap< string, int, less<string> >  words;
    int line; //int variable line
    string word;//string variable word

    // For each line of text, the length of input, increment line
    for (line = 1; input; line++)
    {
        char buf[ 255 ];//create a character with space of 255
        input.getline( buf, 128 );//buf is pointer to array of chars where
        //extracted, 128 is maximum num of chars to write to s.

        // Discard all punctuation characters, leaving only words
        for ( char *p = buf;
              *p != '\0';
              p++ )

        {
            if ( ispunct( *p ) )
                *p = ' ';
        }
        //

        istringstream i( buf );

        while ( i )
        {
            i >> word;
            if ( word != "" )
            {
                words.insert( pair<const string,int>( word, line ) );
            }
        }
    }

    input.close();

    // Output results
    multimap< string, int, less<string> >::iterator it1;
    multimap< string, int, less<string> >::iterator it2;



    for ( it1 = words.begin(); it1 != words.end(); )
    {

        it2 = words.upper_bound( (*it1).first );
        cout << (*it1).first << " : ";

        for ( ; it1 != it2; it1++ )
        {
            cout << (*it1).second << " ";
        }
        cout << endl;
    }

    return 0;
}
期望输出:

Thank : 5
are : 4
bark : 2
cats : 3 4
dogs : 1 2 4 
fast : 1 
hard : 3 
loud : 2 
not : 4
run : 1
sleep : 3
you : 5 

提前感谢您的帮助

您没有删除标点符号,而是用空格替换
istringstream
尝试解析这些空格,但如果失败。您应该检查解析单词是否成功,或者是否以这种方式进行:

i >> word;
if (!i.fail()) {
    words.insert(pair<const string, int>(word, line));
}
i>>word;
如果(!i.fail()){
单词。插入(成对(单词,行));
}

因为使用C++,避免使用指针会更方便,而重点是使用STD函数。我会像这样重写代码的一部分:

// For each line of text, the length of input, increment line
for (line = 1; !input.eof(); line++)
{
    std::string buf;
    std::getline(input, buf);

    istringstream i( buf );

    while ( i )
    {
        i >> word;
        if (!i.fail()) {
            std::string cleanWord;
            std::remove_copy_if(word.begin(), word.end(),
                                std::back_inserter(cleanWord),
                                std::ptr_fun<int, int>(&std::ispunct)
            );
            if (!cleanWord.empty()) {
                words.insert(pair<const string, int>(cleanWord, line));
            }
        }
    }
}

input.close();

// Output results
multimap< string, int, less<string> >::iterator it1;
multimap< string, int, less<string> >::iterator it2;
//对于每行文本,输入的长度,增量行
对于(line=1;!input.eof();line++)
{
std::字符串buf;
std::getline(输入,buf);
istringstream i(buf);
而(i)
{
i>>单词;
如果(!i.fail()){
std::字符串清除字;
std::remove_copy_if(word.begin(),word.end(),
标准:背面插入器(cleanWord),
std::ptr_fun(&std::ispunt)
);
如果(!cleanWord.empty()){
单词。插入(成对(干净的单词,行));
}
}
}
}
input.close();
//输出结果
多重映射::迭代器it1;
multimap:迭代器it2;

当您在调试器中逐步执行此操作时,会看到什么?@RichardCritten Ah!出于某种原因,它在句子末尾的映射中增加了一个额外的计数。它在做额外的一行44
words.insert(成对(字,行))为什么要这样做?它不应该因为标点符号被删除而停止吗?
// For each line of text, the length of input, increment line
for (line = 1; !input.eof(); line++)
{
    std::string buf;
    std::getline(input, buf);

    istringstream i( buf );

    while ( i )
    {
        i >> word;
        if (!i.fail()) {
            std::string cleanWord;
            std::remove_copy_if(word.begin(), word.end(),
                                std::back_inserter(cleanWord),
                                std::ptr_fun<int, int>(&std::ispunct)
            );
            if (!cleanWord.empty()) {
                words.insert(pair<const string, int>(cleanWord, line));
            }
        }
    }
}

input.close();

// Output results
multimap< string, int, less<string> >::iterator it1;
multimap< string, int, less<string> >::iterator it2;