Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/277.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# HTML Agility Pack-筛选器Href值结果_C#_Web Scraping_Html Agility Pack - Fatal编程技术网

C# HTML Agility Pack-筛选器Href值结果

C# HTML Agility Pack-筛选器Href值结果,c#,web-scraping,html-agility-pack,C#,Web Scraping,Html Agility Pack,我在做一个刮网器。下面的文本显示了问题末尾给出的代码的结果,该代码从一个页面获取所有HREF的值 我只想获取包含docid= index.php?pageid=a45475a11ec72b843d74959b60fd7bd64556e8988583f # _documents.php摘要 index.php?pageid=a45475a11ec72b843d74959b60fd7bd64579b861c1d7b # index.php?pageid=a45475a11ec72b843d74959

我在做一个刮网器。下面的文本显示了问题末尾给出的代码的结果,该代码从一个页面获取所有HREF的值

我只想获取包含
docid=

index.php?pageid=a45475a11ec72b843d74959b60fd7bd64556e8988583f

#

_documents.php摘要

index.php?pageid=a45475a11ec72b843d74959b60fd7bd64579b861c1d7b

#

index.php?pageid=a45475a11ec72b843d74959b60fd7bd64579e0509c7f0&apform=0

decisions.php?doctype=decisions/Signed 决议和文件ID=12637784353880003271#sam

decisions.php?doctype=decisions/Signed 决议和文件ID=1263778902166932156#sam

?doctype=决策/签署决议&年份=1986年&月份=一月#负责人

?doctype=决定/签署决议&年份=1986年&月份=2月#头

代码如下:

        string url = urlTextBox.Text;
        string sourceCode = Extractor.getSourceCode(url);

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(sourceCode);
        List<string> links = new List<string>();

        if (links != null)
        {
            foreach (HtmlAgilityPack.HtmlNode nd in doc.DocumentNode.SelectNodes("//a[@href]"))
            {
                links.Add(nd.Attributes["href"].Value);
            }
        }
        else
        {
            MessageBox.Show("No Links Found");
        }

        if (links != null)
        {
            foreach (string str in links)
            {
                richTextBox9.Text += str + "\n";
            }
        }
        else
        {
            MessageBox.Show("No Link Values Found");
        }
stringurl=urlTextBox.Text;
字符串sourceCode=Extractor.getSourceCode(url);
HtmlAgilityPack.HtmlDocument doc=新的HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(源代码);
列表链接=新列表();
如果(链接!=null)
{
foreach(doc.DocumentNode.SelectNodes(“//a[@href]”)中的HtmlAgilityPack.HtmlNode nd)
{
links.Add(nd.Attributes[“href”].Value);
}
}
其他的
{
MessageBox.Show(“未找到链接”);
}
如果(链接!=null)
{
foreach(链接中的字符串str)
{
richTextBox9.Text+=str+“\n”;
}
}
其他的
{
Show(“未找到链接值”);
}

我如何才能做到这一点?

为什么不干脆更换这个:

links.Add(nd.Attributes["href"].Value);
为此:

if (nd.Attributes["href"].Value.Contains("docid="))
    links.Add(nd.Attributes["href"].Value);

我在这里做了一些编辑。请仔细检查:)