C# 使用XPath For Agility Pack在两个已知元素之间选择表元素

C# 使用XPath For Agility Pack在两个已知元素之间选择表元素,c#,html,xpath,html-agility-pack,C#,Html,Xpath,Html Agility Pack,我正在尝试从此布局中的表中选择元素: <tbody> <tr class="header"> <th colspan="4">Tier 1</th> </tr> <tr> <td><a>First Thing</a></td> <td><a>Second Thing</a></td

我正在尝试从此布局中的表中选择元素:

<tbody>
<tr class="header">
      <th colspan="4">Tier 1</th>
 </tr>
 <tr>
          <td><a>First Thing</a></td>
          <td><a>Second Thing</a></td>
          <td><a>Third Thing</a></td>
          <td></td>
 </tr>
 <tr>
          <td><a>Fourth Thing</a></td>
          <td><a>Fifth Thing</a></td>
          <td><a>Sixth Thing</a></td>
          <td></td>
      </tr>


<tr class="header">
      <th colspan="4">Tier 2</th>
 </tr>
 <tr>
          <td><a>First Thing</a></td>
          <td><a>Second Thing</a></td>
          <td><a>Third Thing</a></td>
          <td></td>
 </tr>
 <tr>
          <td><a>Fourth Thing</a></td>
          <td><a>Fifth Thing</a></td>
          <td><a>Sixth Thing</a></td>
          <td></td>
      </tr>
我想选择tr class=header标记之间的所有值。我将需要这样做5次,在真正的表上有6层,这里没有列出,因为它太长了,最后我需要从最后的标题选择表的底部。 我应该说明,我在MVC中使用的是Agility Pack,所以XPath似乎是一个不错的选择。 到目前为止,我已经能够使用//tr[@class='header']//th来隔离头文件。 主要的问题似乎是我想要的节点是彼此的兄弟节点,而不是使遍历更容易的子节点。
最后,我想在我的数据结构中给所有tier 1元素一个值1,给所有tier 2元素一个值2,以此类推,以供以后比较

首先-您需要使用扩展方法按层拆分行:

public static IEnumerable<IEnumerable<T>> SplitBy<T>(
    this IEnumerable<T> source, Func<T, bool> separator)
{
    List<T> batch = new List<T>();

    using (var iterator = source.GetEnumerator())
    {
        while (iterator.MoveNext())
        {
            if (separator(iterator.Current) && batch.Any())
            {
                yield return batch;
                batch = new List<T>();
            }

            batch.Add(iterator.Current);
        }
    }

    if (batch.Any())
        yield return batch;
}
第二步是从每一层提取数据

var result = from t in tiers
             let tier = t.First().SelectSingleNode("th").InnerText
             from a in t.Skip(1).SelectMany(tr => tr.SelectNodes("td/a"))
             select new {
                 Tier = tier,
                 Value = a.InnerText
             };
结果是

[
  { Tier: "Tier 1", Value: "First Thing" },
  { Tier: "Tier 1", Value: "Second Thing" },
  { Tier: "Tier 1", Value: "Third Thing" },
  { Tier: "Tier 1", Value: "Fourth Thing" },
  { Tier: "Tier 1", Value: "Fifth Thing" },
  { Tier: "Tier 1", Value: "Sixth Thing" },
  { Tier: "Tier 2", Value: "First Thing" },
  { Tier: "Tier 2", Value: "Second Thing" },
  { Tier: "Tier 2", Value: "Third Thing" },
  { Tier: "Tier 2", Value: "Fourth Thing" },
  { Tier: "Tier 2", Value: "Fifth Thing" },
  { Tier: "Tier 2", Value: "Sixth Thing" }
]

解析此html的预期结果是什么?tr class=header之间的所有值意味着什么?只有两个节点。你想要它们的值吗?要制作一个列表,它只有一个通用参数,这实际上会重复6次。我只是不想一遍又一遍地复制整件事。共有6层。呃,对不起,请列出
[
  { Tier: "Tier 1", Value: "First Thing" },
  { Tier: "Tier 1", Value: "Second Thing" },
  { Tier: "Tier 1", Value: "Third Thing" },
  { Tier: "Tier 1", Value: "Fourth Thing" },
  { Tier: "Tier 1", Value: "Fifth Thing" },
  { Tier: "Tier 1", Value: "Sixth Thing" },
  { Tier: "Tier 2", Value: "First Thing" },
  { Tier: "Tier 2", Value: "Second Thing" },
  { Tier: "Tier 2", Value: "Third Thing" },
  { Tier: "Tier 2", Value: "Fourth Thing" },
  { Tier: "Tier 2", Value: "Fifth Thing" },
  { Tier: "Tier 2", Value: "Sixth Thing" }
]