C# 在c中捕获链接的rel类型和href#_C#_Parsing

C# 在c中捕获链接的rel类型和href#

c# parsing

C# 在c中捕获链接的rel类型和href#,c#,parsing,C#,Parsing,我有一个字符串，它应该包含一个项目列表，格式为，{0}、{1}、{2}都是字符串，我想基本上提取它们我确实希望这样做是为了解决html解析问题的一部分，我听说用正则表达式解析html是不好的。（像）我甚至不知道如何使用正则表达式来实现这一点这就是我所能做到的 string format = "<link rel=\".*\" type=\".*\" href=\".*\">"; Regex reg = new Regex(format); MatchCollection mat

我有一个字符串，它应该包含一个项目列表，格式为，{0}、{1}、{2}都是字符串，我想基本上提取它们

我确实希望这样做是为了解决html解析问题的一部分，我听说用正则表达式解析html是不好的。（像）

我甚至不知道如何使用正则表达式来实现这一点

这就是我所能做到的

string format = "<link rel=\".*\" type=\".*\" href=\".*\">";
Regex reg = new Regex(format);
MatchCollection matches = reg.Matches(input, 0);
foreach (Match match in matches)
 {
        string rel = string.Empty;
        string type = string.Empty;
        string href = string.Empty;
        //not sure what to do here to get these values for each from the match
 }

string format=”“；
Regex reg=新的Regex（格式）；
MatchCollection matches=reg.matches（输入，0）；
foreach（匹配中的匹配）
{
string rel=string.Empty；
字符串类型=string.Empty；
string href=string.Empty；
//不确定在这里如何从匹配中获取每个值
}

在我的研究发现之前，我可能完全错误地使用正则表达式

您将如何使用我选择的方法或HTML解析器来实现这一点

使用HTML Agility pack库解析HTML，您最好使用真正的HTML解析器，如HTML Agility pack。你可以得到它

不使用正则表达式进行HTML解析的一个主要原因是它的格式可能不正确（几乎总是这样），这可能会破坏正则表达式解析器

然后使用XPath获取所需的节点并将其加载到变量中

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(pageMarkup);
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//link");
string rel;

if(nodes[0].Attributes["rel"] != null)
{
    rel = nodes[0].Attributes["rel"]; 
}

谢谢我给你打勾是因为你的答案有有用的代码，你解释了为什么使用解析器而不是正则表达式。感谢Rony也提供了HTML敏捷包的链接，我刚刚下载了它。