C++ 在C+中计算字符串中的唯一单词+；_C++_C++11

C++ 在C+中计算字符串中的唯一单词+；

c++ c++11

C++ 在C+中计算字符串中的唯一单词+；,c++,c++11,C++,C++11,我想计算字符串“s”中有多少唯一的单词，其中标点符号和换行符（\n）分隔每个单词。到目前为止，我已经使用逻辑or运算符来检查字符串中有多少字分隔符，并在结果中添加1以获得字符串s中的字数我当前的代码返回12作为字数。既然‘ab’、‘ab’、‘ab’、‘ab’（与‘zzzz’相同）都是相同而非唯一的，我怎么能忽略一个词的变体呢？我遵循链接：，但引用计算向量中的唯一项。但是，我使用的是字符串而不是向量这是我的密码： #include <iostream> #include <s

我想计算字符串“s”中有多少唯一的单词，其中标点符号和换行符（

\n

）分隔每个单词。到目前为止，我已经使用逻辑or运算符来检查字符串中有多少字分隔符，并在结果中添加1以获得字符串s中的字数

我当前的代码返回12作为字数。既然‘ab’、‘ab’、‘ab’、‘ab’（与‘zzzz’相同）都是相同而非唯一的，我怎么能忽略一个词的变体呢？我遵循链接：，但引用计算向量中的唯一项。但是，我使用的是字符串而不是向量

这是我的密码：

#include <iostream>
#include <string>
using namespace std;

bool isWordSeparator(char & c) {

    return c == ' ' || c == '-' || c == '\n' || c == '?' || c == '.' || c == ','
    || c == '?' || c == '!' || c == ':' || c == ';';
}

int countWords(string s) {
    int wordCount = 0;

    if (s.empty()) {
    return 0;
    }

    for (int x = 0; x < s.length(); x++) {
    if (isWordSeparator(s.at(x))) {
            wordCount++;

    return wordCount+1;

int main() {
    string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
    int number_of_words = countWords(s);

    cout << "Number of Words: " << number_of_words  << endl;

    return 0;

}

#包括
#包括
使用名称空间std；
布尔isWordSeparator（字符和c）{
返回c=''| c='-'| c='\n'| c='？'| c='.| c='.|c='.'
||c='？'| | c='！'| | c='：'| | c='；
}
int countWords（字符串s）{
int字数=0；
如果（s.empty（））{
返回0；
}
对于（int x=0；xcout您可以创建一组字符串，保存最后一个分隔符（从0开始）的位置，并使用子字符串
提取单词，然后将其插入到集合中。完成后，只需返回集合的大小
通过使用string:：split
，您可以简化整个操作-它为您标记字符串。您只需将返回数组中的所有元素插入集合，然后再次返回其大小
编辑：根据注释，您需要一个自定义比较器来忽略大小写进行比较。
使代码不区分大小写的方法是tolower（）


您可以使用std:：transform
将其应用于原始字符串：
std::transform(s.begin(), s.end(), s.begin(), ::tolower);

<>我应该补充一下，当前的代码比C++更接近C，也许你应该看看标准库所提供的内容。
我建议istringstream
+istream\u iterator
用于标记化，或者unique\u copy
或者set
用于消除重复项，如下所示：
在将字符串拆分为单词的同时，将所有单词插入std:：set
。这将消除重复项。然后这只是cal的问题lingset:：size（）
以获取唯一单词的数量
在我的解决方案中，我使用的是来自的boost:：split（）函数，因为它现在几乎是标准的。
代码注释中的解释
#include <iostream>
#include <string>
#include <set>
#include <boost/algorithm/string.hpp>
using namespace std;

// Function suggested by user 'mshrbkv':
bool isWordSeparator(char c) {
    return std::isspace(c) || std::ispunct(c);
}

// This is used to make the set case-insensitive.
// Alternatively you could call boost::to_lower() to make the
// string all lowercase before calling boost::split(). 
struct IgnoreCaseCompare { 
    bool operator()( const std::string& a, const std::string& b ) const {
        return boost::ilexicographical_compare( a, b );
    }
};

int main()
{
    string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";

    // Define a set that will contain only unique strings, ignoring case.
    set< string, IgnoreCaseCompare > words;

    // Split the string by using your isWordSeparator function
    // to define the delimiters. token_compress_on collapses multiple
    // consecutive delimiters into only one. 
    boost::split( words, s, isWordSeparator, boost::token_compress_on );

    // Now the set contains only the unique words.
    cout << "Number of Words: " << words.size() << endl;
    for( auto& w : words )
        cout << w << endl;

    return 0;
}

#包括
#包括
#包括
#包括
使用名称空间std；
//用户“mshrbkv”建议的功能：
布尔isWordSeparator（字符c）{
返回std：：isspace（c）| std：：ispunt（c）；
}
//这用于使集合不区分大小写。
//或者，您可以调用boost:：to_lower（）使
//在调用boost:：split（）之前，将字符串全部小写。
结构IgnoreCaseCompare{
布尔运算符（）（常数std:：string&a，常数std:：string&b）常数{
返回升压：：ilexicographical_比较（a，b）；
}
};
int main（）
{
字符串s=“ab\nAb！ab？ab:ab.ab；ab\nAb\nZZZZ-zzzz\nZZZZ”；
//定义一个只包含唯一字符串的集合，忽略大小写。
设置单词；
//使用isWordSeparator函数拆分字符串
//定义分隔符。标记压缩将折叠多个
//仅将连续分隔符转换为一个分隔符。
boost:：split（单词、s、isWordSeparator、boost:：token\u compress\u on）；
//现在，集合只包含唯一的单词。
cout首先，我建议重写isWordSeparator
如下：
bool isWordSeparator(char c) {
    return std::isspace(c) || std::ispunct(c);
}

因为您当前的实现不能处理所有标点和空格，比如\t
或+

另外，当isWordSeparator
为true时，增加wordCount
是不正确的，例如，如果您有类似？！
的内容
因此，一种不太容易出错的方法是用空格替换所有分隔符，然后迭代将它们插入（无序）集合的单词：
#包括
#包括
#包括
#包括
#包括
int countWords（标准：：字符串s）{
std:：transform（s.begin（），s.end（），s.begin（），[]（字符c）{
if（isWordSeparator（c））{
返回“”；
}
返回std：：tolower（c）；
});
std：：无序的_集uniqWords；
标准：：stringstream ss（s）；
std:：copy（std:：istream_迭代器（ss），STD：：ISTRAMAMIATORATION 你可以考虑字符串，而不是vector。-<代码> STD:：在任意顺序容器上工作，包括<代码> STD::String 。你可以先将字符串标记，然后你需要小写所有单词并把它们添加到一个集合中，删除重复，然后返回SE的大小。t、 试试看，如果你陷入了什么困境，请告诉我们。投票结果也太过宽泛。我可以想象有十多种方法。但这对书籍处理来说是很好的。即使克努特在他的《计算机编程艺术》中也写过关于帕梅拉的文章。注意
std:：unique
它对连续的副本有效，所以aa bb aA将把第三个AA作为唯一的。先排序列表，或者使用<代码> STD:：SET/STD:：SET//COD>必须知道不区分大小写的字符串。这需要一个次要参数，即：代码> STD:：SET< /Cord>。您建议使用C++ > ItestIngs<代码>和<代码> ISTRAMAMIDER < /COD>进行标记。你是否可以分享任何我可以轻松学习的在线资源？@Naz AlIslam嗯，这是我想到的一个参考资料来源，这就暗示了一些书。但我不确定“轻松”是什么：）现在，我已经在我的答案中添加了指向工作示例的链接。请注意，我使用lambda将您的isWordSeparator
与tolower组合在一起，如果您
#include <iterator>
#include <unordered_set>
#include <algorithm>
#include <cctype>
#include <sstream>

int countWords(std::string s) {
    std::transform(s.begin(), s.end(), s.begin(), [](char c) { 
        if (isWordSeparator(c)) {
            return ' ';
        }

        return std::tolower(c);
    });

    std::unordered_set<std::string> uniqWords;

    std::stringstream ss(s);
    std::copy(std::istream_iterator<std::string>(ss), std::istream_iterator<std::string(), std::inserter(uniqWords));

    return uniqWords.size();
}