C# HttpClient使用登录名c从网站上刮取数据#

C# HttpClient使用登录名c从网站上刮取数据#,c#,httpclient,html-agility-pack,C#,Httpclient,Html Agility Pack,我想从以下网站上搜集一些数据: 该网站包含一些关于乒乓球的数据。无需登录即可访问实际季节,仅需登录即可访问最后一个季节。对于实际的赛季,我已经创建了一些代码来获取其中的数据,并且运行良好。我正在使用HtmlAlityPack中的HttpClient。代码如下所示: HttpClient http = new HttpClient(); var response = await http.GetByteArrayAsync(website);

我想从以下网站上搜集一些数据:

该网站包含一些关于乒乓球的数据。无需登录即可访问实际季节,仅需登录即可访问最后一个季节。对于实际的赛季,我已经创建了一些代码来获取其中的数据,并且运行良好。我正在使用HtmlAlityPack中的HttpClient。代码如下所示:

            HttpClient http = new HttpClient();
            var response = await http.GetByteArrayAsync(website);
            String source = Encoding.GetEncoding("utf-8").GetString(response, 0, response.Length - 1);
            source = WebUtility.HtmlDecode(source);
            HtmlDocument resultat = new HtmlDocument();
            resultat.LoadHtml(source);

            Do something to get the relevant data from resultat by scanning the DocumentNodes from resultat...

现在我想从需要登录的网站获取数据。有人知道如何登录网站并获取数据吗?登录必须通过单击“ErgeBinshistorie freischalten…”完成,然后输入用户名和密码。

登录网站的方法有很多,这取决于特定网站使用的身份验证方法(表单身份验证、基本身份验证、Windows身份验证等)。通常,网站使用FormsAuthentication

要使用HttpClient在标准表单身份验证网站中执行登录,您需要设置CookieContainer,因为身份验证数据将在Cookie上设置

在您的特定示例中,登录表单以HTTPS的形式向任何页面发送帖子,我将其用作示例。这是使用HttpClient发出请求的代码:

var baseAddress = new Uri("https://wttv.click-tt.de/");
var cookieContainer = new CookieContainer();
using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
using (var client = new HttpClient(handler) { BaseAddress = baseAddress })
{
    //usually i make a standard request without authentication, eg: to the home page.
    //by doing this request you store some initial cookie values, that might be used in the subsequent login request and checked by the server
    var homePageResult = client.GetAsync("/");
    homePageResult.Result.EnsureSuccessStatusCode();

    var content = new FormUrlEncodedContent(new[]
    {
        //the name of the form values must be the name of <input /> tags of the login form, in this case the tag is <input type="text" name="username">
        new KeyValuePair<string, string>("username", "username"),
        new KeyValuePair<string, string>("password", "password"),
    });
    var loginResult = client.PostAsync("/cgi-bin/WebObjects/nuLigaTTDE.woa/wa/teamPortrait?teamtable=1673669&pageState=rueckrunde&championship=SK+Bez.+BB+13%2F14&group=204559", content).Result;
    loginResult.EnsureSuccessStatusCode();

    //make the subsequent web requests using the same HttpClient object
}

您能否使用.NET
WebBrowser
类,在那里传递登录屏幕,然后重定向到所需的url,并获取HTML输出?@Robert:这对WebBrowser是如何工作的?WebBrowser是一个.NET控件。它就像一个内置的网络浏览器。您基本上可以访问登录页面,插入凭证,然后提交到登录页面。如果身份验证成功,您可以导航到所需的url,只需将其刮取即可,1.没有特殊的登录页面,登录在网站2内。之后无法使用该url(如果使用该url,您仍将注销)。未登录时的URL与登录时的URL之间的唯一区别是:http。。。。(未登录);htpps。。。(已登录)。亲爱的Stefano,感谢您的快速回复!我在上面使用了您的代码,并在之后尝试添加我文章中的代码(使用已经存在的httpClient对象,而不是生成新的)。不幸的是,在执行代码时发生错误(任务已被取消)(第行:var response…)。也许我没有理解正确?loginResult的预期值是多少?代码在控制台应用程序中运行良好:。然而,我没有一个真正的用户名和密码来测试。loginResult的预期值是您登录网站时在其中获得的页面的html代码。我编辑了这篇文章,并向您展示了一个简单的httpwebrequest示例亲爱的Stefano,非常感谢!如果您有时间,我很高兴您也能帮助我登录到以下页面:我尝试了几件事,但都做不到。您必须更改登录url、登录参数,最重要的是,您必须启用自动解压缩。请参见此处的更新代码:和更新的答案。
var cookieContainer = new CookieContainer();

HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("https://wttv.click-tt.de/");
request.CookieContainer = cookieContainer;
//set the user agent and accept header values, to simulate a real web browser
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";


//SET AUTOMATIC DECOMPRESSION
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;

Console.WriteLine("FIRST RESPONSE");
Console.WriteLine();
using (WebResponse response = request.GetResponse())
{
    using (StreamReader sr = new StreamReader(response.GetResponseStream()))
    {
        Console.WriteLine(sr.ReadToEnd());
    }
}

request = (HttpWebRequest)HttpWebRequest.Create("https://wttv.click-tt.de/cgi-bin/WebObjects/nuLigaTTDE.woa/wa/teamPortrait?teamtable=1673669&pageState=rueckrunde&championship=SK+Bez.+BB+13%2F14&group=204559");
//set the cookie container object
request.CookieContainer = cookieContainer;
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";

//set method POST and content type application/x-www-form-urlencoded
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";

//SET AUTOMATIC DECOMPRESSION
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;

//insert your username and password
string data = string.Format("username={0}&password={1}", "username", "password");
byte[] bytes = System.Text.Encoding.UTF8.GetBytes(data);

request.ContentLength = bytes.Length;

using (Stream dataStream = request.GetRequestStream())
{
    dataStream.Write(bytes, 0, bytes.Length);
    dataStream.Close();
}

Console.WriteLine("LOGIN RESPONSE");
Console.WriteLine();
using (WebResponse response = request.GetResponse())
{
    using (StreamReader sr = new StreamReader(response.GetResponseStream()))
    {
        Console.WriteLine(sr.ReadToEnd());
    }
}

//request = (HttpWebRequest)HttpWebRequest.Create("INTERNAL PROTECTED PAGE ADDRESS");
//After a successful login, you must use the same cookie container for all request
//request.CookieContainer = cookieContainer;

//....