C# 如何使用htamlagilitypack从html文件提取所有http链接，然后提取标记之间的http链接？_C#_Html Agility Pack

C# 如何使用htamlagilitypack从html文件提取所有http链接，然后提取标记之间的http链接？

C# 如何使用htamlagilitypack从html文件提取所有http链接，然后提取标记之间的http链接？,c#,html-agility-pack,C#,Html Agility Pack,我正在尝试以下代码： private void htmlparsing(string htmlfile) { List<string> test = new List<string>(); HtmlDocument doc = new HtmlDocument(); doc.Load(htmlfile); foreach (HtmlNode link in do

我正在尝试以下代码：

private void htmlparsing(string htmlfile)
        {
            List<string> test = new List<string>();
            HtmlDocument doc = new HtmlDocument();
            doc.Load(htmlfile);
            foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
            {
                HtmlAttribute att = link.Attributes["href"];
                test.Add(att.Value);  
            }
            doc.Save(@"d:\file.htm");
        }

private void htmlparsing（字符串htmlfile）
{
列表测试=新列表（）；
HtmlDocument doc=新的HtmlDocument（）；
文档加载（htmlfile）；
foreach（doc.DocumentNode.SelectNodes（“//a[@href]”）中的HtmlNode链接）
{
HtmlAttribute att=link.Attributes[“href”]；
测试。添加（附件值）；
}
doc.Save（@“d:\file.htm”）；
}

这是我正在处理的html文件：

当我使用断点并在工作完成后观看列表测试时，我看到154个链接，但我没有看到html文件内容中的这些链接：

“，”

有很多链接有61-62个链接，我在列表测试中没有看到这些链接

第二，这种联系是在：

var images=新数组(

最后呢

))

所以第一步我想从html文件中获取所有http链接。其次，我想从html文件中筛选并获取所有http链接，这些链接位于：var images=new Array（和）之间

);

希望这对您有用。此代码仅适用于此var images=new Array（）中的链接。

List test=newlist（）；
string extractUrls=yourHtmlText；
extractUrls=extractUrls.Remove（0，extractUrls.IndexOf（“var-images=new-Array（“）+”var-images=new-Array（“.Length”）；
extractUrls=extractUrls.Substring（0，extractUrls.IndexOf（“；”）.Replace（“）”，“）.Trim（）；
string[]url=extractUrls.Split（'，'）；
foreach（url中的字符串url）
{
test.Add（url.Trim（）.Replace（“\”，”）；
}

隔离问题。仅使用相关HTML进行测试，将URL替换为

http://example.com

并在此处显示HTML。

            List<string> test = new List<string>();
            string extractUrls = YourHtmlInText;
            extractUrls = extractUrls.Remove(0, extractUrls.IndexOf("var images = new Array(") + " var images = new Array(".Length);
            extractUrls = extractUrls.Substring(0, extractUrls.IndexOf(";")).Replace(")", "").Trim();
            string[] urls = extractUrls.Split(',');
            foreach (String url in urls)
            {
                test.Add(url.Trim().Replace("\"",""));
            }