使用Xpath解析HTML?

使用Xpath解析HTML?,html,parsing,html-agility-pack,Html,Parsing,Html Agility Pack,我想帮助我的问题是:我得到的数据,但得到的数据是重复的。非常感谢。 HTML <div id="items" style="width: 940px; height: 2176px; position: relative;"> <div class="item masonry-brick" style="top: 0px; right: 0px; position: absolute;"> <div class="picture"&g

我想帮助我的问题是:我得到的数据,但得到的数据是重复的。非常感谢。 HTML

<div id="items" style="width: 940px; height: 2176px; position: relative;">
        <div class="item masonry-brick" style="top: 0px; right: 0px; position: absolute;">
        <div class="picture">
            <a title="bikini" class="image" href="...-bikini.html">
                <img alt="bikini" src="...13508.jpg">
            </a>
            <div class="item-content">
                <h2><a href="...bikini.html">bikini</a></h2>
                <div class="item_social">
                    <ul>
                        <li><i class="fa fa-eye"></i><span>6</span></li>
                        <li><i class="fa fa-thumbs-o-up"></i><span>0</span></li>
                        <li><i class="fa fa-comments"></i><span>0</span></li>
                    </ul>
                </div>
                <div class="author-post">
                    <a class="author" href="....nuong" rel="nofollow">
                        <img class="author_avatar" alt="nương" src="....ae3c3d8a6a.png">

                        <span class="author_name">nương</span>
                        <ul class="author_item">
                            <li><span>13 giờ trước </span></li>
                        </ul>
                    </a>
                </div>
            </div>
        </div>
    </div>
//..... more item masonry-brick
 </div>

  • 六,
  • 0
  • 0
//..... 更多项目砌筑砖
我的代码C#解析“但获得的数据中有重复的图像和文本!”,但项目编号完整

HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(htmlPage);
List<Data> datas = new List<Data>();
foreach (var div in htmlDocument.DocumentNode.SelectNodes("//div[starts-with(@class, 'item')]"))

{
    Data newdata = new Data();
    newdata.Imgsrc = div.SelectSingleNode("//div[@class='picture']//img").Attributes["src"].Value;
    newdata.Title = div.SelectSingleNode("//div[@class='item-content']//h2").InnerText.Trim();
    newdata.Summary = div.SelectSingleNode("//div[@class='author-post']//span").InnerText.Trim();
    datas.Add(newdata);
}
lstDatas.ItemsSource = datas;
HtmlDocument HtmlDocument=new HtmlDocument();
htmlDocument.LoadHtml(htmlPage);
列表数据=新列表();
foreach(htmlDocument.DocumentNode.SelectNodes中的var div(“//div[以(@class,'item')]”开头)
{
Data newdata=新数据();
newdata.Imgsrc=div.SelectSingleNode(“//div[@class='picture']//img”).Attributes[“src”].Value;
newdata.Title=div.SelectSingleNode(“//div[@class='item-content']//h2”).InnerText.Trim();
newdata.Summary=div.SelectSingleNode(“//div[@class='author-post']///span”).InnerText.Trim();
添加数据(新数据);
}
lstDatas.ItemsSource=数据;
谢谢你!

您需要在XPath开头添加句点(
),以指示XPath搜索范围是当前
div
上下文中的本地范围:

foreach (var div in htmlDocument.DocumentNode.SelectNodes("//div[starts-with(@class, 'item')]"))
{
    Data newdata = new Data();
    newdata.Imgsrc = div.SelectSingleNode(".//div[@class='picture']//img").Attributes["src"].Value;
    newdata.Title = div.SelectSingleNode(".//div[@class='item-content']//h2").InnerText.Trim();
    newdata.Summary = div.SelectSingleNode(".//div[@class='author-post']//span").InnerText.Trim();
    datas.Add(newdata);
}
否则,XPath将在整个
HtmlDocument
中搜索,并在每次迭代中一次又一次地返回第一个匹配的节点,这就是为什么会得到那些重复的节点