C# 如何获取网页'；s内容并将其保存到字符串变量中_C#_Asp.net_Screen Scraping

C# 如何获取网页'；s内容并将其保存到字符串变量中

c# asp.net

C# 如何获取网页'；s内容并将其保存到字符串变量中,c#,asp.net,screen-scraping,C#,Asp.net,Screen Scraping,如何使用ASP.NET获取网页内容？我需要编写一个程序来获取网页的HTML并将其存储到字符串变量中 Webclient client = new Webclient(); string content = client.DownloadString(url); 传递要获取的页面的URL。您可以使用htmlagilitypack解析结果。您可以使用我以前遇到过Webclient.Downloadstring问题。如果需要，您可以尝试以下方法： WebRequest request = WebR

如何使用ASP.NET获取网页内容？我需要编写一个程序来获取网页的HTML并将其存储到字符串变量中

Webclient client = new Webclient();
string content = client.DownloadString(url);

传递要获取的页面的URL。您可以使用htmlagilitypack解析结果。

您可以使用

我以前遇到过Webclient.Downloadstring问题。如果需要，您可以尝试以下方法：

WebRequest request = WebRequest.Create("http://www.google.com");
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
    html = sr.ReadToEnd();
}

我建议不要使用

WebClient.DownloadString

。这是因为（至少在.NET 3.5中）DownloadString不够聪明，无法使用/删除BOM表（如果存在）。这可能会导致在返回UTF-8数据（至少没有字符集）时，BOM（

ï»

）错误地显示为字符串的一部分-点击

相反，这种微小的变化将在BOM中正常工作：

string ReadTextFromUrl(string url) {
    // WebClient is still convenient
    // Assume UTF8, but detect BOM - could also honor response charset I suppose
    using (var client = new WebClient())
    using (var stream = client.OpenRead(url))
    using (var textReader = new StreamReader(stream, Encoding.UTF8, true)) {
        return textReader.ReadToEnd();
    }
}

你能详细说明一下你遇到的问题吗？@Greg，这是一个与性能相关的问题。我从来没有真正解决过这个问题，但是WebClient.DownloadString需要5-10秒的时间来提取HTML，其中as-WebRequest/WebResponse几乎是即时的。只是想提出另一个备选解决方案，以防OP有类似的问题，或者希望对请求/响应有更多的控制权。@Scott-+1查找此问题。只需运行一些测试。首次使用DownloadString需要更长的时间（5299ms DownloadString vs 200ms WebRequest）。在50个BBC、50个CNN和50个其他RSS提要URL上进行了测试，使用不同的URL避免缓存。在初始加载后，BBC的DownloadString比CNN快20毫秒，CNN快300毫秒。对于另一个RSS源，WebRequest快了3毫秒。一般来说，我想我会使用WebRequestforSingles和DownloadString来循环URL。这对我来说非常有效，谢谢！为了节省其他人的搜索时间，WebRequest在System.Net中，Stream在System.IoScott中，@HockeyJ-我不知道您使用WebClient后发生了什么变化，但当我测试它（使用.Net 4.5.2）时，它足够快-950ms（仍然比单个WebRequest慢一点，后者需要450毫秒，但肯定不是5-10秒）.不幸的是，DownloadString（从.NET 3.5开始）不够智能，无法使用BOM。我在回答中加入了另一个选项。没有投票，因为没有使用（WebClient client=new WebClient（））{}：）这相当于史蒂文·斯皮尔伯格3分钟前发布的答案，所以没有+1。提交错误报告

string ReadTextFromUrl(string url) {
    // WebClient is still convenient
    // Assume UTF8, but detect BOM - could also honor response charset I suppose
    using (var client = new WebClient())
    using (var stream = client.OpenRead(url))
    using (var textReader = new StreamReader(stream, Encoding.UTF8, true)) {
        return textReader.ReadToEnd();
    }
}