C# 无法提取<；链接>；元素使用HtmlAgilityPack和XPath_C#_Xpath_Rss_Html Agility Pack

C# 无法提取<；链接>；元素使用HtmlAgilityPack和XPath

c# xpath rss

C# 无法提取<；链接>；元素使用HtmlAgilityPack和XPath,c#,xpath,rss,html-agility-pack,C#,Xpath,Rss,Html Agility Pack,我正在使用Html敏捷包从rss xml中选择文本数据。对于每个其他节点类型（title、pubdate、guid.etc），我可以使用XPath约定选择内部文本，但是当查询“//link”或“item/link”返回空字符串时 public static IEnumerable<string> ExtractAllLinks(string rssSource) { //Create a new document. var document = new HtmlDoc

我正在使用Html敏捷包从rss xml中选择文本数据。对于每个其他节点类型（title、pubdate、guid.etc），我可以使用XPath约定选择内部文本，但是当查询“//link”或“item/link”返回空字符串时

public static IEnumerable<string> ExtractAllLinks(string rssSource)
{
    //Create a new document.
    var document = new HtmlDocument();
    //Populate the document with an rss file.
    document.LoadHtml(rssSource);
    //Select out all of the required nodes.
    var itemNodes = document.DocumentNode.SelectNodes("item/link");
    //If zero nodes were found, return an empty list, otherwise return the content of those nodes.
    return itemNodes == null ? new List<string>() : itemNodes.Select(itemNode => itemNode.InnerText).ToList();
}

<site><link>Hello World</link><name>Fred</name></site>

公共静态IEnumerable ExtractAllLinks（字符串rssSource）
{
//创建一个新文档。
var document=新的HtmlDocument（）；
//使用rss文件填充文档。
document.LoadHtml（rssSource）；
//选择所有需要的节点。
var itemNodes=document.DocumentNode.SelectNodes（“项目/链接”）；
//如果找到零个节点，则返回空列表，否则返回这些节点的内容。
return itemNodes==null？new List（）：itemNodes.Select（itemNode=>itemNode.InnerText）.ToList（）；
}

有人知道为什么这个元素的行为与其他元素不同吗

附加：运行“item/link”返回零个节点。运行“//link”返回正确的节点数，但内部文本长度为零个字符

使用下面的测试数据，with“//name”为“fred”返回一条记录，而with“//link”则返回一条带空字符串的记录

public static IEnumerable<string> ExtractAllLinks(string rssSource)
{
    //Create a new document.
    var document = new HtmlDocument();
    //Populate the document with an rss file.
    document.LoadHtml(rssSource);
    //Select out all of the required nodes.
    var itemNodes = document.DocumentNode.SelectNodes("item/link");
    //If zero nodes were found, return an empty list, otherwise return the content of those nodes.
    return itemNodes == null ? new List<string>() : itemNodes.Select(itemNode => itemNode.InnerText).ToList();
}

<site><link>Hello World</link><name>Fred</name></site>

你好，弗雷德

我相信这是因为世界的“联系”。如果我把它改成“linkz”，它会工作得很好

下面的解决方法非常有效。然而，我想了解为什么搜索“//link”不能像其他元素那样工作

public static IEnumerable<string> ExtractAllLinks(string rssSource)
{
    rssSource = rssSource.Replace("<link>", "<link-renamed>");
    rssSource = rssSource.Replace("</link>", "</link-renamed>");
    //Create a new document.
    var document = new HtmlDocument();
    //Populate the document with an rss file.
    document.LoadHtml(rssSource);
    //Select out all of the required nodes.
    var itemNodes = document.DocumentNode.SelectNodes("//link-renamed");
    //If zero nodes were found, return an empty list, otherwise return the content of those nodes.
    return itemNodes == null ? new List<string>() : itemNodes.Select(itemNode => itemNode.InnerText).ToList();
}

公共静态IEnumerable ExtractAllLinks（字符串rssSource）
{
rssSource=rssSource.Replace（“，”）；
rssSource=rssSource.Replace（“，”）；
//创建一个新文档。
var document=新的HtmlDocument（）；
//使用rss文件填充文档。
document.LoadHtml（rssSource）；
//选择所有需要的节点。
var itemNodes=document.DocumentNode.SelectNodes（//链接重命名”）；
//如果找到零个节点，则返回空列表，否则返回这些节点的内容。
return itemNodes==null？new List（）：itemNodes.Select（itemNode=>itemNode.InnerText）.ToList（）；
}

如果打印

DocumentNode.OuterHtml

，您将看到问题：

var html = @"<site><link>Hello World</link><name>Fred</name></site>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
Console.WriteLine(doc.DocumentNode.OuterHtml);

输出：

<site><link>Hello World<name>Fred</name></site>

<site><link>Hello World</link><name>Fred</name></site>
Hello World

你好，弗雷德
你好，世界

*)除了

链接

，默认情况下包含在

元素标签

字典中的特殊标签的完整列表，可以在的源代码中看到。其中最受欢迎的有

，

，等等。

谢谢！我有一种感觉，这与“保留”这个词或“特别”这个词有关，但谷歌今天不是我的朋友。标记为接受。