C#使用htmlagility抓取URL_C#_Html_Url_Html Agility Pack

C#使用htmlagility抓取URL

c# html url

C#使用htmlagility抓取URL,c#,html,url,html-agility-pack,C#,Html,Url,Html Agility Pack,好的，我在这个网页上有这个URL列表，我想知道如何获取URL并将它们添加到ArrayList 我只想在列表中的网址，看看它，看看我的意思。我试着自己做，不管出于什么原因，它需要所有其他网址，除了我需要的 http://pastebin.com/a7hJnXPP 使用如果您只想要列表中的那些，那么下面的代码应该可以工作（假设您已经将页面加载到HtmlDocument） List hrefList=new List（）//列出原因清单很酷。 foreach（HtmlNode node a

好的，我在这个网页上有这个URL列表，我想知道如何获取URL并将它们添加到ArrayList

我只想在列表中的网址，看看它，看看我的意思。我试着自己做，不管出于什么原因，它需要所有其他网址，除了我需要的

   http://pastebin.com/a7hJnXPP

使用

如果您只想要列表中的那些，那么下面的代码应该可以工作（假设您已经将页面加载到

HtmlDocument

）

List hrefList=new List（）//列出原因清单很酷。
foreach（HtmlNode node animePage.DocumentNode.SelectNodes（//a[contains（@href，'id='））]））
{
//将animenewsnetwork.com附加到href值的开头并添加它
//到名单上。
hrefList.Add（“http://www.animenewsnetwork.com“+node.GetAttributeValue（“href”，“null”）；
}

//a[包含（@href，'id='）]

将此XPath分解如下：

```
//a
```
选择所有
节点
```
[包含（@href，'id='）]
```
。。。包含包含文本
```
id=
```
的
```
href
```
属性的

这应该足够让你走了

顺便说一句，我建议不要在自己的messagebox中列出每个链接，因为该页面上大约有500个链接。500个链接=500个消息框：（

你怎么知道在单节点中放置什么，或者选择节点区域？为什么//div[@class='1st']。你为什么这样做？我用chrome打开页面并检查了。PS:its

lst

不是

1st

using (var wc = new WebClient())
{
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(wc.DownloadString("http://www.animenewsnetwork.com/encyclopedia/anime.php?list=A"));
    var links = doc.DocumentNode.SelectSingleNode("//div[@class='lst']")
        .Descendants("a")
        .Select(x => x.Attributes["href"].Value)
        .ToArray();
}

List<string> hrefList = new List<string>(); //Make a list cause lists are cool.

foreach (HtmlNode node animePage.DocumentNode.SelectNodes("//a[contains(@href, 'id=')]"))
{
    //Append animenewsnetwork.com to the beginning of the href value and add it
    // to the list.
    hrefList.Add("http://www.animenewsnetwork.com" + node.GetAttributeValue("href", "null"));
}