C#从网站获取链接（内容）的HTML_C#_Html_Hyperlink

C#从网站获取链接（内容）的HTML

c# html hyperlink

C#从网站获取链接（内容）的HTML,c#,html,hyperlink,C#,Html,Hyperlink,我想要的是，从一个网站打开一个链接（从HtmlContent）并获取此新打开网站的Html 我有www.google.com，现在我想找到所有链接。对于每个链接，我想有新网站的HTMLContent 我是这样做的： foreach (String link in GetLinksFromWebsite(htmlContent)) { using (var client = new WebClient())

我想要的是，从一个网站打开一个链接（从HtmlContent）并获取此新打开网站的Html

我有www.google.com，现在我想找到所有链接。对于每个链接，我想有新网站的HTMLContent

我是这样做的：

foreach (String link in GetLinksFromWebsite(htmlContent))
            {
                using (var client = new WebClient())
                {
                    htmlContent = client.DownloadString("http://" + link);
                }

                foreach (Match treffer in istBildURL)
                {
                    string bildUrl = treffer.Groups[1].Value;
                    bildLinks.Add(bildUrl);
                }
            }




   public static List<String> GetLinksFromWebsite(string htmlSource)
    {
        string linkPattern = "<a href=\"(.*?)\">(.*?)</a>";
        MatchCollection linkMatches = Regex.Matches(htmlSource, linkPattern, RegexOptions.Singleline);
        List<string> linkContents = new List<string>();
        foreach (Match match in linkMatches)
        {
            linkContents.Add(match.Value);
        }
        return linkContents;
    }

foreach（GetLinksFromWebsite中的字符串链接（htmlContent））
{
使用（var client=new WebClient（））
{
htmlContent=client.DownloadString（“http://”+link）；
}
foreach（伊斯特比尔杜尔的赛道）
{
字符串bildUrl=treffer.Groups[1]。值；
添加（bildUrl）；
}
}
公共静态列表GetLinksFromWeb（字符串htmlSource）
{
字符串链接模式=”；
MatchCollection linkMatches=Regex.Matches（htmlSource、linkPattern、RegexOptions.Singleline）；
列表链接内容=新列表（）；
foreach（linkMatches中的匹配）
{
linkContents.Add（match.Value）；
}
返回链接内容；
}

另一个问题是，我只获取链接，而不获取链接按钮（ASP.NET）。。如何解决此问题？

以下步骤：

下载

引用您在项目中下载的程序集

从项目中抛出所有以单词

regex

或

regular expression

开头的内容，这些内容涉及解析HTML（阅读以更好地理解原因）。在您的情况下，这将是

GetLinksFromWebsite

方法的内容

用对Html Agility Pack解析器的简单调用来替换您丢弃的内容

这是：

使用系统；
使用System.Collections.Generic；
使用System.Linq；
Net系统；
使用HtmlAgilityPack；
班级计划
{
静态void Main（）
{
使用（var client=new WebClient（））
{
var htmlSource=client.DownloadString（“http://www.stackoverflow.com");
foreach（GetLinksFromWebsite（htmlSource）中的var项）
{
//TODO:您可以轻松编写递归函数
//它将在这里调用自己并检索相应的内容
//网站的名称。。。
控制台写入线（项目）；
}
}
}
公共静态列表GetLinksFromWeb（字符串htmlSource）
{
var doc=新的HtmlDocument（）；
doc.LoadHtml（htmlSource）；
退货单
.DocumentNode
.SelectNodes（“//a[@href]”）
.Select（node=>node.Attributes[“href”].Value）
.ToList（）；
}
}

你真的应该使用一个Html解析器，比如

HtmlAgilityPack

thx 4答案，我会测试它并给出反馈，如果它有效，我会标记答案：D

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using HtmlAgilityPack;

class Program
{
    static void Main()
    {
        using (var client = new WebClient())
        {
            var htmlSource = client.DownloadString("http://www.stackoverflow.com");
            foreach (var item in GetLinksFromWebsite(htmlSource))
            {
                // TODO: you could easily write a recursive function
                // that will call itself here and retrieve the respective contents
                // of the site ...
                Console.WriteLine(item);
            }
        }
    }

    public static List<String> GetLinksFromWebsite(string htmlSource)
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(htmlSource);
        return doc
            .DocumentNode
            .SelectNodes("//a[@href]")
            .Select(node => node.Attributes["href"].Value)
            .ToList();
    }
}