Xpath HtmlAgility Pack get单节点get null值_Xpath_Html Agility Pack

Xpath HtmlAgility Pack get单节点get null值

xpath

Xpath HtmlAgility Pack get单节点get null值,xpath,html-agility-pack,Xpath,Html Agility Pack,我试图用XPath获取一个节点，但是我在节点上得到一个空值，不知道为什么 WebClient wc = new WebClient(); string nodeValue; string htmlCode = wc.DownloadString("http://www.freeproxylists.net/fr/?c=&pt=&pr=&a%5B%5D=0&a%5B%5D=1&a%5B%5D=2&u=5

我试图用XPath获取一个节点，但是我在节点上得到一个空值，不知道为什么

        WebClient wc = new WebClient();
        string nodeValue;
        string htmlCode = wc.DownloadString("http://www.freeproxylists.net/fr/?c=&pt=&pr=&a%5B%5D=0&a%5B%5D=1&a%5B%5D=2&u=50");
        HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
        html.LoadHtml(htmlCode);
        HtmlNode node = html.DocumentNode.SelectSingleNode("//table[@class='DataGrid']/tbody/tr[@class='Odd']/td/a");
        nodeValue = (node.InnerHtml);

与试图从中获取信息的html相比，我发现xpath中至少有两个错误

有三件事我要检查：#1）响应是否在超时开始之前出现（即在调试中，您能看到htmlCode字符串集吗？#2）如果您想使用xpath，响应是否是格式良好的xml（对我来说，该页面会给出验证错误），#3如果您选择单个节点，确保您的xpath不匹配多个，如果在开始时使用//很可能会匹配多个xpath，因此在结尾添加[1]以强制仅返回第一个匹配项。我没有花时间写代码让你不回应。

    private void HtmlParser(string url)
    {
        HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.OptionFixNestedTags=true;

        GetHTML(url);
        htmlDoc.Load("x.html", Encoding.ASCII, true);      

        HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//table[@class='DataGrid']/descendant::*/tr[@class='Odd']/td/script");
        List<string> urls = new List<string>();

        foreach(HtmlNode x in nodes)
        {
            urls.Add(ConvertStringToUrl(x.InnerText));
        }

        Console.WriteLine(ReadingTheAnchor(urls[0]));
    }

    private string ConvertStringToUrl(string octUrl)
    {

        octUrl = octUrl.Replace("IPDecode(\"", "");
        octUrl = octUrl.Remove(octUrl.Length -2);
        octUrl = octUrl.Replace("%", "");
        string ascii = string.Empty;

        for (int i = 0; i < octUrl.Length; i += 2)
        {
            String hs = string.Empty;

            hs   = octUrl.Substring(i,2);
            uint decval =   System.Convert.ToUInt32(hs, 16);
            char character = System.Convert.ToChar(decval);
            ascii += character;

        }
        //Now you get the <a> containing the links. which all can be read as seperate html files containing just a <a>
        Console.WriteLine(ascii);
        return ascii;
    }

    private string ReadingTheAnchor(string anchor)
    {
        //returns url of anchor
        HtmlDocument anchorHtml = new HtmlAgilityPack.HtmlDocument();
        anchorHtml.LoadHtml(anchor);
        HtmlNode h = anchorHtml.DocumentNode.SelectSingleNode("a");
        return h.GetAttributeValue("href", "");
    }

    //using OpenQA.Selenium; using OpenQA.Selenium.Firefox;
    private void GetHTML(string url)
    {
        using (var driver = new FirefoxDriver())
        {
            driver.Navigate().GoToUrl(url);
            Console.Clear();
            System.IO.File.WriteAllText("x.html", driver.PageSource);
        }
    }