C# 尝试使用HtmlAlityPack从网页提取数据

C# 尝试使用HtmlAlityPack从网页提取数据,c#,web,html-agility-pack,C#,Web,Html Agility Pack,我正试图从 我在所附图片中显示了所需的字段 Xpath:/html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td/p/table/tbody/tr/td/table/tbody/tr[2]/td[2]/font 错误:发生异常,使用该Xpath找不到数据。“HtmlAgilityPack.dll中发生类型为'System.Net.WebException'的未处理异常” 源代码: static void Main(string[] arg

我正试图从

我在所附图片中显示了所需的字段 Xpath:/html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td/p/table/tbody/tr/td/table/tbody/tr[2]/td[2]/font

错误:发生异常,使用该Xpath找不到数据。“HtmlAgilityPack.dll中发生类型为'System.Net.WebException'的未处理异常”

源代码:

static void Main(string[] args)
    {
        /************************************************************************/
        string tickerid = "Bse_Prc_tick";
        HtmlAgilityPack.HtmlDocument doc = new   HtmlWeb().Load(@"http://www.dsebd.org/displayCompany.php?name=NBL", "GET");

        if (doc != null)
        {
            // Fetch the stock price from the Web page
            string stockprice = doc.DocumentNode.SelectSingleNode(string.Format("./html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td1/p1/table1/tbody/tr/td1/table/tbody/tr[2]/td[2]/font", tickerid)).InnerText;
            Console.WriteLine(stockprice);
        }
        Console.WriteLine("ReadKey Starts........");
        Console.ReadKey();
}

用try-catch包装代码,以获取有关异常的更多信息。

好的,我检查过了。我们使用的XPath完全不正确。真正的乐趣始于当你试图找出错误所在的时候

只需查看您正在使用的页面的源代码,除了许多妨碍XPath运行的错误之外,它甚至包含多个HTML标记

ChromeDev工具,以及您使用的工具,可以在由浏览器更正的dom树上工作(全部打包到单个html节点中,添加了一些tbody,等等)

由于html结构被简单地破坏了,所以HtmlAgilityPack解析也变得如此

在这种情况下,您可以使用RegExp,也可以只搜索源代码中的已知元素(这要快得多,但不够灵活)

例如:

...
using System.Net; //required for Webclient
...
        class Program
        {
            //entry point of console app
            static void Main(string[] args)
            {
                // url to download
                // "var" means I am too lazy to write "string" and let compiler decide typing
                var url = @"http://www.dsebd.org/displayCompany.php?name=NBL";

                // creating object in using makes Garbage Collector delete it when using block ends, as opposed to standard cleaning after whole function ends
                using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
                {

                    // simply download result to string, in this case it will be html code
                    string htmlCode = client.DownloadString(url);
                    // cut html in half op position of "Last Trade:"
                    // searching from beginning of string is easier/faster than searching in middle
                    htmlCode = htmlCode.Substring(
                        htmlCode.IndexOf("Last Trade:")
                        );
                    // select from .. to .. and then remove leading and trailing whitespace characters
                    htmlCode = htmlCode.Substring("2\">", "</font></td>").Trim();
                    Console.WriteLine(htmlCode);
                }
                Console.ReadLine();
            }
        }
        // http://stackoverflow.com/a/17253735/3147740 <- copied from here
        // this is Extension Class which adds overloaded Substring() I used in this code, it does what its comments says
        public static class StringExtensions
        {
            /// <summary>
            /// takes a substring between two anchor strings (or the end of the string if that anchor is null)
            /// </summary>
            /// <param name="this">a string</param>
            /// <param name="from">an optional string to search after</param>
            /// <param name="until">an optional string to search before</param>
            /// <param name="comparison">an optional comparison for the search</param>
            /// <returns>a substring based on the search</returns>
            public static string Substring(this string @this, string from = null, string until = null, StringComparison comparison = StringComparison.InvariantCulture)
            {
                var fromLength = (from ?? string.Empty).Length;
                var startIndex = !string.IsNullOrEmpty(from)
                    ? @this.IndexOf(from, comparison) + fromLength
                    : 0;

                if (startIndex < fromLength) { throw new ArgumentException("from: Failed to find an instance of the first anchor"); }

                var endIndex = !string.IsNullOrEmpty(until)
                ? @this.IndexOf(until, startIndex, comparison)
                : @this.Length;

                if (endIndex < 0) { throw new ArgumentException("until: Failed to find an instance of the last anchor"); }

                var subString = @this.Substring(startIndex, endIndex - startIndex);
                return subString;
            }
        }
。。。
Net系统//Webclient所需的
...
班级计划
{
//控制台应用程序的入口点
静态void Main(字符串[]参数)
{
//要下载的url
//“var”表示我懒得写“string”并让编译器决定类型
var url=@”http://www.dsebd.org/displayCompany.php?name=NBL";
//在using中创建对象会使垃圾收集器在使用块结束时删除它,而不是在整个函数结束后进行标准清理
使用(WebClient=newWebClient())//WebClient类继承IDisposable
{
//只需将结果下载到字符串,在本例中,它将是html代码
字符串htmlCode=client.DownloadString(url);
//在“最后一笔交易:”的一半op位置剪切html:
//从字符串开头搜索比从中间搜索更容易/更快
htmlCode=htmlCode.Substring(
htmlCode.IndexOf(“上次交易:”)
);
//选择从..到..,然后删除前导和尾随空白字符
htmlCode=htmlCode.Substring(“2\”>“,”).Trim();
控制台写入线(htmlCode);
}
Console.ReadLine();
}
}

// http://stackoverflow.com/a/17253735/3147740 您确定XPath是正确的吗?Chrome F12工具显示了指向您标记的字段的不同路径。我从名为“XPath Helper”的Chrome扩展中获得了XPath。它应该不会错。顺便说一下,我正在检查它。希望我能找到正确的路径@PTwr@Leon:I“固定有关XPath的问题,请参阅编辑的文章。它非常有效。谢谢你宝贵的时间。请原谅我的无知,但你的代码对我来说有点复杂,因为我是一个新手。我发现XPath有点简单。顺便说一下,谢谢。它会起作用的,我会学会方法的@PTwr@Leon那我就添加一些评论。