C# 阅读html asp.net C的特定段落_C#_Asp.net

C# 阅读html asp.net C的特定段落

c# asp.net

C# 阅读html asp.net C的特定段落,c#,asp.net,C#,Asp.net,下面给出的代码是用于从html页面读取段落的。这是很好的代码，但如何能够一个接一个段落，或者如果我必须只保存段落2或5，如何只能选择段落的特定编号 public string GetParagraphs(string html, int numberOfParagraphs) { const string paragraphSeparator = "</p>"; var paragraphs = html.Split(new[] { paragraphSeparato

下面给出的代码是用于从html页面读取段落的。这是很好的代码，但如何能够一个接一个段落，或者如果我必须只保存段落2或5，如何只能选择段落的特定编号

public string GetParagraphs(string html, int numberOfParagraphs)
{
    const string paragraphSeparator = "</p>";
    var paragraphs = html.Split(new[] { paragraphSeparator }, StringSplitOptions.RemoveEmptyEntries);
    return string.Join("", paragraphs.Take(numberOfParagraphs).Select(paragraph => paragraph + paragraphSeparator));
}

除了这段代码从根本上被破坏之外，您不能在

上拆分，您发现的并非所有HTML都是有效的HTML，您似乎只是在寻找Skip方法：

如果您想正确执行此操作，请使用。一旦你有了它，你就可以做如下事情：

      HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
      htmlDoc.OptionFixNestedTags = true;
      htmlDoc.Load(new StringReader(PageContent));
      if (htmlDoc.DocumentNode != null)
      {
        HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes(XPath);
        // Work with nodes selected via XPath here
      }

PageContent变量应包含网页的整个HTML内容。XPath变量是一个简单的XPath查询，例如：//p将为您提供所有段落

      HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
      htmlDoc.OptionFixNestedTags = true;
      htmlDoc.Load(new StringReader(PageContent));
      if (htmlDoc.DocumentNode != null)
      {
        HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes(XPath);
        // Work with nodes selected via XPath here
      }