C# 如何逐字阅读课文_C#_Text_Streamreader

C# 如何逐字阅读课文

c# text

C# 如何逐字阅读课文,c#,text,streamreader,C#,Text,Streamreader,我正在使用txt或htm文件。目前我正在逐字符查找文档，使用for循环，但我需要逐字查找文本，然后逐字查找单词内部。我该怎么做 for (int i = 0; i < text.Length; i++) {} for（int i=0；i

我正在使用txt或htm文件。目前我正在逐字符查找文档，使用for循环，但我需要逐字查找文本，然后逐字查找单词内部。我该怎么做

for (int i = 0; i < text.Length; i++)
{}

for（int i=0；i

使用text.Split（“”）
按空格将其拆分为一个单词数组，然后遍历该数组
所以
您可以在空白处拆分：
string[] words = text.split(' ')

将给您一个单词数组，然后您可以遍历它们
foreach(string word in words)
{
    word // do something with each word
}

我想你可以用split
         var  words = reader.ReadToEnd().Split(' ');

或使用
foreach(String words in text.Split(' '))
   foreach(Char char in words )

您可以在空白处拆分字符串，但必须处理标点符号和HTML标记（您说过您正在处理txt和htm文件）
一种简单的方法是使用无参数（按空格字符分割）：
我以前读过整行
要解析HTML，我将使用一个健壮的库，如。
您可以使用。如果你觉得这太过分了，看看吧
一旦定义了单词，就可以将每个节点的文本内容拆分为单词
也许像
regexps是什么
using System;
using System.Linq;
using System.Text.RegularExpressions;

namespace ConsoleApplication58
{
    class Program
    {
        static void Main()
        {
            string input =
                @"I'm working with a txt or htm file. And currently I'm looking up the document char by char, using for loop, but I need to look up the text word by word, and then inside the word char by char. How can I do this?";
            var list = from Match match in Regex.Matches(input, @"\b\S+\b")
                       select match.Value; //Get IEnumerable of words
            foreach (string s in list) 
                Console.WriteLine(s); //doing something with it
            Console.ReadKey();
        }
    }
}

它可以与任何delimeter一起工作，而且是最快的完成方法。
下面是我对StreamReader
的延迟扩展的实现。这样做的目的是不将整个文件加载到内存中，尤其是当文件是一条单行时
public static string ReadWord(this StreamReader stream, Encoding encoding)
{
    string word = "";
    // read single character at a time building a word 
    // until reaching whitespace or (-1)
    while(stream.Read()
       .With(c => { // with each character . . .
            // convert read bytes to char
            var chr = encoding.GetChars(BitConverter.GetBytes(c)).First();

            if (c == -1 || Char.IsWhiteSpace(chr))
                 return -1; //signal end of word
            else
                 word = word + chr; //append the char to our word

            return c;
    }) > -1);  // end while(stream.Read() if char returned is -1
    return word;
}

public static T With<T>(this T obj, Func<T,T> f)
{
    return f(obj);
}

您需要一种在文件中划界单词的方法。空白可能会起作用，但我可以看到标点符号等问题。使用正则表达式匹配表示单词的模式。然后按搜索匹配字符char@Alan我不推荐。是一个很好的HTML解析器。Net@Alan对于文本文件来说，它可能工作得很好，但我认为可以安全地假设他的.htm文件包含HTML标记，这将非常难以用正则表达式进行解析。
using (StreamReader sr = new StreamReader(path)) 
{
    while (sr.Peek() >= 0) 
    {
        string line = sr.ReadLine();
        string[] words = line.Split();
        foreach(string word in words)
        {
            foreach(Char c in word)
            {
                // ...
            }
        }
    }
}

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(text);

foreach(HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
    var nodeText = node.InnerText;
}

using HtmlAgilityPack;

static IEnumerable<string> WordsInHtml(string text)
{
    var splitter = new Regex(@"[^\p{L}]*\p{Z}[^\p{L}]*");

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(text);

    foreach(HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
    {
        foreach(var word in splitter.Split(node.InnerText)
        {
            yield return word;
        }
    }
}

foreach(var word in WordsInHtml(text))
{
    foreach(var c in word)
    {
        // a enumeration by word then char.
    }
}

using System;
using System.Linq;
using System.Text.RegularExpressions;

namespace ConsoleApplication58
{
    class Program
    {
        static void Main()
        {
            string input =
                @"I'm working with a txt or htm file. And currently I'm looking up the document char by char, using for loop, but I need to look up the text word by word, and then inside the word char by char. How can I do this?";
            var list = from Match match in Regex.Matches(input, @"\b\S+\b")
                       select match.Value; //Get IEnumerable of words
            foreach (string s in list) 
                Console.WriteLine(s); //doing something with it
            Console.ReadKey();
        }
    }
}

public static string ReadWord(this StreamReader stream, Encoding encoding)
{
    string word = "";
    // read single character at a time building a word 
    // until reaching whitespace or (-1)
    while(stream.Read()
       .With(c => { // with each character . . .
            // convert read bytes to char
            var chr = encoding.GetChars(BitConverter.GetBytes(c)).First();

            if (c == -1 || Char.IsWhiteSpace(chr))
                 return -1; //signal end of word
            else
                 word = word + chr; //append the char to our word

            return c;
    }) > -1);  // end while(stream.Read() if char returned is -1
    return word;
}

public static T With<T>(this T obj, Func<T,T> f)
{
    return f(obj);
}

using (var s = File.OpenText(file))
{
    while(!s.EndOfStream)
        s.ReadWord(Encoding.Default).ToCharArray().DoSomething();
}