C# HtmlAgilityPack-无法从表中获取嵌套元素innerText
我使用HtmlAgilityPack从具有以下结构的表中获取数据:C# HtmlAgilityPack-无法从表中获取嵌套元素innerText,c#,asp.net,xpath,html-agility-pack,C#,Asp.net,Xpath,Html Agility Pack,我使用HtmlAgilityPack从具有以下结构的表中获取数据: <table> <tbody class="border_tbody"> <tr style="height:55px;"> <th class="heading_one" colspan="2">Heading 1</th> <th class="heading_two">Headin
<table>
<tbody class="border_tbody">
<tr style="height:55px;">
<th class="heading_one" colspan="2">Heading 1</th>
<th class="heading_two">Heading 2</th>
<th class="heading_three">heading 3</th>
</tr>
<tr>
<td class="ro">
<a href="go/a/a.com" target="_blank">
<img src="images/vendors_images/vendors_ficon/a.png" height="17px" width="17px" alt="a" title="a">
</a>
</td>
<td td="" class="l no_border">
<a href="go/a/a.com" target="_blank">
Vendor name
</a>
</td>
<td class="l lo" style="text-align: center;"><a href="go/a/a.com" target="_blank">15%</a></td>
<td class="l bonus_amount">
<a href="go/a/a.com" class="apply_text" target="_blank">
<div class="coupon_div">
<span class="coupon_span">
<span class="card_secondary_text">$10</span>
</span>
</div>
</a>
</td>
</tr>
<tr>
<td class="ro">
<a href="go/a/a.com" target="_blank">
<img src="images/vendors_images/vendors_ficon/a.png" height="17px" width="17px" alt="a" title="a">
</a>
</td>
<td td="" class="l no_border">
<a href="go/a/a.com" target="_blank">
Vender name
</a>
</td>
<td class="l lo" style="text-align: center;"><a href="go/a/a.com" target="_blank">6%</a></td>
<td class="l" style="text-align: center;"></td>
</tr>
<tr>
<td class="ro">
<a href="go/a/a.com" target="_blank">
<img src="images/vendors_images/vendors_ficon/a.png" height="17px" width="17px" alt="a a" title="a a">
</a>
</td>
<td td="" class="l no_border">
<a href="go/a/a.com" target="_blank">
Vendor name
</a>
</td>
<td class="l lo" style="text-align: center;"><a href="go/a/a.com" target="_blank">5%</a></td>
<td class="l bonus_amount">
<a href="apply/a" class="apply_text" target="_blank">
<div class="coupon_div">
<span class="coupon_span">
<span class="card_secondary_text">$50</span> - Apply
</span>
</div>
</a>
</td>
</tr>
</tbody>
</table>
对于问题中给出的示例HTML,第四个单元格的类名似乎总是相同的。如果不是,则可以遍历所有子节点,查找以美元符号开头的文本:
HtmlDocument webDoc = new HtmlDocument();
webDoc.LoadHtml(html);
foreach (var table in webDoc.DocumentNode.SelectNodes("//table/tbody"))
{
foreach (var tr in table.SelectNodes("tr[position() > 1]"))
{
if (tr != null)
{
// [1] class name in HTML sample always the same
var rateTwo = tr.SelectSingleNode("td[4]//span[@class='card_secondary_text']");
Console.WriteLine("Method 1 Coupon: {0}",
rateTwo != null ? rateTwo.InnerText : "none"
);
// [2] brute force - all descendants
var rateTwo2 = tr.SelectSingleNode("td[4]").Descendants();
if (rateTwo2.Count() > 0)
{
foreach (var child in rateTwo2)
{
if (child.InnerText.StartsWith("$") && child.NodeType == HtmlNodeType.Element)
Console.WriteLine("Method 2 Coupon: {0}", child.InnerText);
}
}
else
{
Console.WriteLine("Method 2: No coupon");
}
}
}
}
输出:
Method 1 Coupon: $10
Method 2 Coupon: $10
Method 1 Coupon: none
Method 2: No coupon
Method 1 Coupon: $50
Method 2 Coupon: $50
您无法从单元格中获取文本,因为没有文本。。。好的,那么问题是什么?你的问题是什么?
Method 1 Coupon: $10
Method 2 Coupon: $10
Method 1 Coupon: none
Method 2: No coupon
Method 1 Coupon: $50
Method 2 Coupon: $50