C# 对于此示例,如何使用HtmlAlityPack从HTML中提取文本?
我想从HTML源代码中提取文本。我正在尝试使用c#和htmlagilitypack dll 资料来源是:C# 对于此示例,如何使用HtmlAlityPack从HTML中提取文本?,c#,linq,xpath,html-agility-pack,html-content-extraction,C#,Linq,Xpath,Html Agility Pack,Html Content Extraction,我想从HTML源代码中提取文本。我正在尝试使用c#和htmlagilitypack dll 资料来源是: <table> <tr> <td class="title"> <a onclick="func1">Here 2</a> </td> <td class="arrow"> <img src="src1" width="9" height="8" al
<table>
<tr>
<td class="title">
<a onclick="func1">Here 2</a>
</td>
<td class="arrow">
<img src="src1" width="9" height="8" alt="Down">
</td>
<td class="percent">
<span>39%</span>
</td>
<td class="title">
<a onclick="func2">Here 1</a>
</td>
<td class="arrow">
<img src="func3" width="9" height="8" alt="Up">
</td>
<td class="percent">
<span>263%</span>
</td>
</tr>
</table>
这里2
39%
这里1
263%
我怎样才能从表中得到这里1和这里2的文本
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml("web page string");
var xyz = from x in htmlDoc.DocumentNode.DescendantNodes()
where x.Name == "td" && x.Attributes.Contains("class")
where x.Attributes["class"].Value == "title"
select x.InnerText;
不是很漂亮,但是应该可以用Xpath版本
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(t);
//this simply works because InnerText is iterative for all child nodes
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//td[@class='title']");
//but to be more accurate you can use the next line instead
//HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//td[@class='title']/a");
string result;
foreach (HtmlNode item in nodes)
result += item.InnerText;
对于LINQ版本,只需更改var节点=。。符合:
var Nodes = from x in htmlDoc.DocumentNode.DescendantNodes()
where x.Name == "td" && x.Attributes["class"].Value == "title"
select x.InnerText;
如何显示单元格文本?请使用innerText,也可以在xpath中使用text(),如“/td[@class='title']/a/text()”