使用.NET通过HTTPS下载文件(第2部分)
我必须定期在web浏览器中手动执行以下操作:使用.NET通过HTTPS下载文件(第2部分),.net,file,https,download,.net,File,Https,Download,我必须定期在web浏览器中手动执行以下操作: 转到https网站 在网络表单上登录 单击链接下载一个大文件(135MB) 我想使用.NET自动化这个过程 几天前我把这个贴在这里。多亏了Rubens Farias的一段代码,我现在能够执行上述步骤1和2。在第2步之后,我能够读取包含要下载文件URL的页面的HTML(使用afterLoginPage=reader.ReadToEnd()。此页面仅在授权登录时显示,因此验证步骤2是否成功 我现在的问题当然是如何执行步骤3。我尝试了一些方法,但没有成功,
Dim httpsSite As String = "https://www.test.test/user/login"
' enter correct address
Dim formPage As String = ""
Dim afterLoginPage As String = ""
' Get postback data and cookies
Dim cookies As New CookieContainer()
Dim getRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
getRequest.CookieContainer = cookies
getRequest.Method = "GET"
Dim wp As WebProxy = New WebProxy("[our proxies IP address]", [our proxies port number])
wp.Credentials = CredentialCache.DefaultCredentials
getRequest.Proxy = wp
Dim form As HttpWebResponse = DirectCast(getRequest.GetResponse(), HttpWebResponse)
Using response As New StreamReader(form.GetResponseStream(), Encoding.UTF8)
formPage = response.ReadToEnd()
End Using
Dim inputs As New Dictionary(Of String, String)()
inputs.Add("form_build_id", "[some code I'd like to keep secret]")
inputs.Add("form_id", "user_login")
For Each input As Match In Regex.Matches(formPage, "<input.*?name=""(?<name>.*?)"".*?(?:value=""(?<value>.*?)"".*?)? />", RegexOptions.IgnoreCase Or RegexOptions.ECMAScript)
If input.Groups("name").Value <> "form_build_id" And _
input.Groups("name").Value <> "form_id" Then
inputs.Add(input.Groups("name").Value, input.Groups("value").Value)
End If
Next
inputs("name") = "[our login name]"
inputs("pass") = "[our login password]"
Dim buffer As Byte() = Encoding.UTF8.GetBytes( _
[String].Join("&", _
Array.ConvertAll(Of KeyValuePair(Of String, String), String)(inputs.ToArray(), _
Function(item As KeyValuePair(Of String, String)) (item.Key & "=") + System.Web.HttpUtility.UrlEncode(item.Value))))
Dim postRequest As HttpWebRequest = DirectCast(WebRequest.Create(httpsSite), HttpWebRequest)
postRequest.CookieContainer = cookies
postRequest.Method = "POST"
postRequest.ContentType = "application/x-www-form-urlencoded"
postRequest.Proxy = wp
' send username/password
Using stream As Stream = postRequest.GetRequestStream()
stream.Write(buffer, 0, buffer.Length)
End Using
' get response from login page
Using reader As New StreamReader(postRequest.GetResponse().GetResponseStream(), Encoding.UTF8)
afterLoginPage = reader.ReadToEnd()
End Using
Dim httpsSite作为字符串=”https://www.test.test/user/login"
'输入正确的地址
Dim formPage As String=“”
Dim afterLoginPage As String=“”
'获取回发数据和cookie
Dim cookies作为新的CookieContainer()
Dim getRequest作为HttpWebRequest=DirectCast(WebRequest.Create(httpsSite),HttpWebRequest)
getRequest.CookieContainer=cookies
getRequest.Method=“GET”
Dim wp As WebProxy=新的WebProxy(“[我们的代理IP地址],[我们的代理端口号])
wp.Credentials=CredentialCache.DefaultCredentials
getRequest.Proxy=wp
Dim表单为HttpWebResponse=DirectCast(getRequest.GetResponse(),HttpWebResponse)
将响应用作新的StreamReader(form.GetResponseStream(),Encoding.UTF8)
formPage=response.ReadToEnd()
终端使用
Dim输入作为新字典(字符串,字符串)()
Add(“form_build_id”,“[一些我想保密的代码]”)
输入。添加(“表单id”、“用户登录”)
对于Regex.Matches(formPage,“、RegexOptions.IgnoreCase或RegexOptions.ECMAScript)中作为匹配项的每个输入
如果输入.Groups(“name”).Value“form\u build\u id”和_
输入.Groups(“name”).Value“form_id”然后
inputs.Add(input.Groups(“name”).Value,input.Groups(“Value”).Value)
如果结束
下一个
输入(“名称”)=“[我们的登录名]”
输入(“通过”)=“[我们的登录密码]”
Dim缓冲区为字节()=编码.UTF8.GetBytes(_
[String]。加入(&)_
Array.ConvertAll(属于KeyValuePair(属于String,String),String)(inputs.ToArray()_
函数(项作为KeyValuePair(由字符串和字符串组成))(item.Key&“=”)+System.Web.HttpUtility.UrlEncode(item.Value)))
Dim postRequest作为HttpWebRequest=DirectCast(WebRequest.Create(httpsSite),HttpWebRequest)
postRequest.CookieContainer=cookies
postRequest.Method=“POST”
postRequest.ContentType=“应用程序/x-www-form-urlencoded”
postRequest.Proxy=wp
'发送用户名/密码
使用stream作为stream=postRequest.GetRequestStream()
stream.Write(buffer,0,buffer.Length)
终端使用
'从登录页获取响应
将reader用作新的StreamReader(postRequest.GetResponse().GetResponseStream(),Encoding.UTF8)
afterLoginPage=reader.ReadToEnd()
终端使用
下载文件时是否传递cookies?
正如我在那个问题的评论中所说,您只需要使用DownloadFile
方法:
using(WebClient client = new WebClient())
client.DownloadFile(
"http://www.google.com/", "google_homepage.html");
只需更换”http://www.google.com/“
和您的文件地址。
抱歉,您需要使用HttpWebRequest
:
string fileAddress = "http://www.google.com/";
HttpWebRequest client = (HttpWebRequest)WebRequest.Create(fileAddress));
client.CookieContainer = cookies;
int read = 0;
byte[] buffer = new byte[1024];
using(FileStream download =
new FileStream("google_homepage.html", FileMode.Create))
{
Stream stream = client.GetResponse().GetResponseStream();
while((read = stream.Read(buffer, 0, buffer.Length)) != 0)
{
download.Write(buffer, 0, read);
}
}
您需要保留登录表单返回给您的会话/身份验证cookie。基本上,从身份验证表单的响应中获取cookie,并在执行步骤3时将其发送回 这是一种扩展Web客户机的简单方法,它将为您提供比上述方法简单得多的代码: 只是:
您也可以选择自动化Internet Explorer,而不是尝试通过HTTPS发送Web请求。
使用PowerShell解释这一点,但当您作为COM对象访问Internet Explorer时,也可以在C#中执行此操作。
如果您只需要一个文件,并且不需要担心内存泄漏,则此方法非常有效。我已将此代码转换为VB.NET,并且收到一个协议冲突错误:“无法发送此动词类型的内容体。”在Dim stream中,stream=client.GetRequestStream()您应该直接获取ResponseStream;如果您需要发送参数,请使用querystring。没关系,我没有正确转换代码。现在可以了,再次感谢。