C# C语言中的字数计算算法#_C#_.net

C# C语言中的字数计算算法#

c# .net

C# C语言中的字数计算算法#,c#,.net,C#,.net,我正在寻找一个好的单词计数类或函数。当我从互联网上复制粘贴一些东西，并将其与我的自定义字数计算算法和MS word进行比较时，它总是比10%略高。我觉得这太过分了。你们知道c#。String.Split中精确的字数计算算法吗。使用标点、空格（删除多个空格）和您确定为“单词拆分”的任何其他字符你试过什么我确实看到前一个用户的链接被钉死了，但这里有一些使用正则表达式或字符匹配的例子。希望有帮助，没有人受伤（X-）正如@astander所建议的，您可以执行一个字符串。拆分如下： string

我正在寻找一个好的单词计数类或函数。当我从互联网上复制粘贴一些东西，并将其与我的自定义字数计算算法和MS word进行比较时，它总是比10%略高。我觉得这太过分了。你们知道c#。

String.Split中精确的字数计算算法吗。使用标点、空格（删除多个空格）和您确定为“单词拆分”的任何其他字符

你试过什么

我确实看到前一个用户的链接被钉死了，但这里有一些使用正则表达式或字符匹配的例子。希望有帮助，没有人受伤（X-）

正如@astander所建议的，您可以执行一个字符串。拆分如下：

string[] a = s.Split(
    new char[] { ' ', ',', ';', '.', '!', '"', '(', ')', '?' },
    StringSplitOptions.RemoveEmptyEntries);

通过传入一个字符数组，可以在多个分词符上进行拆分。删除空条目将使您无法计算非单词。

您还需要检查

换行符

，

选项卡

，以及

不间断空格

。我发现最好将源文本复制到StringBuilder中，并用空格替换所有换行符、制表符和句子结尾字符。然后根据空格分割字符串。

我在ClipFlair中遇到了同样的问题，我需要计算电影字幕的WPM（每分钟单词数），因此我提出了以下一个问题：

您可以在静态类中定义此静态扩展方法，然后在需要使用此扩展方法的任何类中向该静态类的命名空间添加using子句。使用s.WordCount（）调用扩展方法，其中s是字符串（标识符[变量/常量]或文字）

publicstaticintwordcount（此字符串为s）
{
int last=s.Length-1；
整数计数=0；
对于（int i=0；i使用正则表达式查找单词（例如[\w]+），然后只计算匹配项
public static Regex regex = new Regex(
  "[\\w]+",
RegexOptions.Multiline
| RegexOptions.CultureInvariant
| RegexOptions.Compiled
);

regex.Match（_someString）.Count这是我为计算单词、亚洲单词、字符等而制作的c代码类的精简版本。这与Microsoft Word几乎相同。
我为Microsoft Word文档开发了计算字数的原始代码
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Text.RegularExpressions;
    namespace BL {
    public class WordCount 
    {

    public int NonAsianWordCount { get; set; }
    public int AsianWordCount { get; set; }
    public int TextLineCount { get; set; }
    public int TotalWordCount { get; set; }
    public int CharacterCount { get; set; }
    public int CharacterCountWithSpaces { get; set; }


    //public string Text { get; set; }

    public WordCount(){}

    ~WordCount() {}


    public void GetCountWords(string s)
    {
        #region Regular Expression Collection
        string asianExpression = @"[\u3001-\uFFFF]";
        string englishExpression = @"[\S]+";
        string LineCountExpression = @"[\r]+";
        #endregion


        #region Asian Character
        MatchCollection asiancollection = Regex.Matches(s, asianExpression);

        AsianWordCount = asiancollection.Count; //Asian Character Count

        s = Regex.Replace(s, asianExpression, " ");

        #endregion 


        #region English Characters Count
        MatchCollection collection = Regex.Matches(s, englishExpression);
        NonAsianWordCount = collection.Count;
        #endregion

        #region Text Lines Count
        MatchCollection Lines = Regex.Matches(s, LineCountExpression);
        TextLineCount = Lines.Count;
        #endregion

        #region Total Character Count

        CharacterCount = AsianWordCount;
        CharacterCountWithSpaces = CharacterCount;

        foreach (Match word in collection)
        {
            CharacterCount += word.Value.Length ;
            CharacterCountWithSpaces += word.Value.Length + 1;
        }

        #endregion

        #region Total Character Count
        TotalWordCount = AsianWordCount + NonAsianWordCount;
        #endregion
    }
}
}

您的算法是否始终过高或过低？或者算法是否有所不同？您是只计算单词，还是粘贴的内容也包含标记。为什么使用MS Word单词计数作为准确性的基准？在“单词”的计数方面存在细微差异可能会导致词数的显著差异。10%并不奇怪。你看到的可能非常准确，但只是略有不同。你是否考虑过MS词数算法可能没有那么准确？我的算法总是太高。我只计算空格。不是为了链接，而是为了谷歌搜索显而易见的内容链接没有带来附加值in@astander别忘了-Split-other，“word1，word2”或“word1？word2”中的StringSplitOptions.RemoveEmptyEntries选项将计为3个字！这很好，但您还应该考虑换行符。如果您键入一个字，按enter键，键入一个字，按enter键，它将返回0个计数。Split（）的一个重载允许字符串数组，因此您可以将此数组更改为字符字符串并添加Environment.Newline（或“\r\n”和\n”）除非你的输入包含非常有限的格式化，否则你可能需要一个更宽的网——考虑卷曲和倾斜的括号、破折号（虽然这可能产生假阳性）和其他标点符号。
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Text.RegularExpressions;
    namespace BL {
    public class WordCount 
    {

    public int NonAsianWordCount { get; set; }
    public int AsianWordCount { get; set; }
    public int TextLineCount { get; set; }
    public int TotalWordCount { get; set; }
    public int CharacterCount { get; set; }
    public int CharacterCountWithSpaces { get; set; }


    //public string Text { get; set; }

    public WordCount(){}

    ~WordCount() {}


    public void GetCountWords(string s)
    {
        #region Regular Expression Collection
        string asianExpression = @"[\u3001-\uFFFF]";
        string englishExpression = @"[\S]+";
        string LineCountExpression = @"[\r]+";
        #endregion


        #region Asian Character
        MatchCollection asiancollection = Regex.Matches(s, asianExpression);

        AsianWordCount = asiancollection.Count; //Asian Character Count

        s = Regex.Replace(s, asianExpression, " ");

        #endregion 


        #region English Characters Count
        MatchCollection collection = Regex.Matches(s, englishExpression);
        NonAsianWordCount = collection.Count;
        #endregion

        #region Text Lines Count
        MatchCollection Lines = Regex.Matches(s, LineCountExpression);
        TextLineCount = Lines.Count;
        #endregion

        #region Total Character Count

        CharacterCount = AsianWordCount;
        CharacterCountWithSpaces = CharacterCount;

        foreach (Match word in collection)
        {
            CharacterCount += word.Value.Length ;
            CharacterCountWithSpaces += word.Value.Length + 1;
        }

        #endregion

        #region Total Character Count
        TotalWordCount = AsianWordCount + NonAsianWordCount;
        #endregion
    }
}
}

public static class WordCount
{
    public static int Count(string text)
    {
        int wordCount = 0;
        text = text.Trim();// trim white spaces

        if (text == ""){return 0;} // end if empty text

        foreach (string word in text.Split(' ')) // or use any other char(instead of empty space ' ') that you consider a word splitter 
        wordCount++;
        return wordCount;
    }
}