C# 如何从Javascript内部下载整个网页内容，包括图像？_C#_Html Agility Pack

C# 如何从Javascript内部下载整个网页内容，包括图像？

C# 如何从Javascript内部下载整个网页内容，包括图像？,c#,html-agility-pack,C#,Html Agility Pack,今天，在我的代码中，我将从以下网站下载图像： using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; using HtmlAgilityPack; using System.IO; usi

今天，在我的代码中，我将从以下网站下载图像：

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using HtmlAgilityPack;
using System.IO;
using System.Text.RegularExpressions;
using System.Xml.Linq;
using System.Net;
using System.Web;
using System.Threading;
using DannyGeneral;
using GatherLinks;

namespace GatherLinks
{
    class RetrieveWebContent
    {
        HtmlAgilityPack.HtmlDocument doc;
        string imgg;
        int images;

        public RetrieveWebContent()
        {
            images = 0;
        }

        public List<string> retrieveImages(string address)
        {
            try
            {
                doc = new HtmlAgilityPack.HtmlDocument();
                System.Net.WebClient wc = new System.Net.WebClient();
                List<string> imgList = new List<string>();
                doc.Load(wc.OpenRead(address));
                HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@src]");
                if (imgs == null) return new List<string>();

                foreach (HtmlNode img in imgs)
                {
                    if (img.Attributes["src"] == null)
                        continue;
                    HtmlAttribute src = img.Attributes["src"];

                    imgList.Add(src.Value);
                    if (src.Value.StartsWith("http") || src.Value.StartsWith("https") || src.Value.StartsWith("www"))
                    {
                        images++;
                        string[] arr = src.Value.Split('/');
                        imgg = arr[arr.Length - 1];
                        wc.DownloadFile(src.Value, @"d:\MyImages\" + imgg);
                    }
                }

                return imgList;
            }
            catch
            {
                Logger.Write("There Was Problem Downloading The Image: " + imgg);
                return null;  
            }
        }
    }
}

使用系统；
使用System.Collections.Generic；
使用系统组件模型；
使用系统数据；
使用系统图；
使用System.Linq；
使用系统文本；
使用System.Windows.Forms；
使用HtmlAgilityPack；
使用System.IO；
使用System.Text.RegularExpressions；
使用System.Xml.Linq；
Net系统；
使用System.Web；
使用系统线程；
使用DannyGeneral；
使用链接；
命名空间聚合链接
{
类检索WebContent
{
HtmlAgilityPack.HtmlDocument文档；
字符串imgg；
int图像；
公共检索WebContent（）
{
图像=0；
}
公共列表检索图像（字符串地址）
{
尝试
{
doc=新的HtmlAgilityPack.HtmlDocument（）；
System.Net.WebClient wc=新系统.Net.WebClient（）；
List imgList=新列表（）；
文档加载（wc.OpenRead（地址））；
HtmlNodeCollection imgs=doc.DocumentNode.SelectNodes（“//img[@src]”）；
if（imgs==null）返回新列表（）；
foreach（imgs中的HtmlNode img）
{
if（img.Attributes[“src”]==null）
继续；
HtmlAttribute src=img.Attributes[“src”]；
imgList.Add（src.Value）；
if（src.Value.StartsWith（“http”）| | src.Value.StartsWith（“https”）| | src.Value.StartsWith（“www”））
{
图像++；
字符串[]arr=src.Value.Split（'/'）；
imgg=arr[arr.Length-1]；
下载文件（src.Value，@“d:\MyImages\”+imgg）；
}
}
返回imgList；
}
抓住
{
Write（“下载图像时出现问题：+imgg”）；
返回null；
}
}
}
}

但有时，在许多情况下，图像都落后于java脚本或在java脚本下，无法正常下载。如何获取/下载图像和/或整个完整的网站内容，包括图像和所有内容，因此稍后在我的硬盘中，我将拥有完整的网站及其所有内容树，以便我可以脱机冲浪。

我将使用实际的浏览器，然后从那里保存图像。。看看Ruby中的解决方案。此库可帮助您自动化浏览器。。。我会将其与实现上述目标相结合

Python等价物也存在

Webdriver尚不支持保存功能，但较旧的“Watir”支持。您可能还想了解哪些浏览器在Javascript语言中提供了一些浏览器自动化功能。

我会使用实际的浏览器，然后从那里保存图像。。看看Ruby中的解决方案。此库可帮助您自动化浏览器。。。我会将其与实现上述目标相结合

Python等价物也存在

Webdriver尚不支持保存功能，但较旧的“Watir”支持。您可能还想了解哪些浏览器在Javascript语言中提供了一些浏览器自动化功能。

听起来有点可疑。。。为了什么？听起来很可疑。。。为了什么？如果上面的一切都不适合你。。。您可以尝试firefox的浏览器扩展。如果以上一切都不适合你。。。您可以尝试firefox的浏览器扩展。