提取链接正则表达式c#_C#_Regex_Capturing Group

提取链接正则表达式c#

c# regex

提取链接正则表达式c#,c#,regex,capturing-group,C#,Regex,Capturing Group,在过去的两个小时里，我一直试图解决这些问题，但似乎找不到任何解决办法我需要从HTML文件中提取链接。有100多个链接，但只有25个是有效的有效链接放置在内部首先，我有（现在仍然有）一个逐字字符串中双引号的问题。因此，我用“普通”字符串替换了逐字记录，这样我就可以使用\“for”，但问题是我编写的Regex不起作用 Match LinksTemp = Regex.Match( htmlCode,

在过去的两个小时里，我一直试图解决这些问题，但似乎找不到任何解决办法

我需要从

HTML

文件中提取链接。有100多个链接，但只有25个是有效的

有效链接放置在内部

首先，我有（现在仍然有）一个逐字字符串中双引号的问题。因此，我用“普通”字符串替换了逐字记录，这样我就可以使用\“for”，但问题是我编写的

Regex

不起作用

Match LinksTemp = Regex.Match(
                              htmlCode,
                              "<td><a href=\"(.*)\">",
                              RegexOptions.IgnoreCase);

Match LinksTemp=Regex.Match(
htmlCode，
"",
RegexOptions.IgnoreCase）；

当我将

“

作为输出而不是

http://www.google.com

任何人都知道如何解决这个问题，以及如何在逐字字符串（例如@“das”sa）中使用双引号。

转义双引号示例：

@“some”test

正则表达式示例：

“”

var match=Regex.match（html，“，
RegexOptions.Singleline）；//拼写错误
var url=match.Groups[1]。值；

此外，您可能希望使用

Regex.Matches（…）

而不是

Regex.Match（…）

转义双引号示例：

@“一些”测试

正则表达式示例：“”

var match=Regex.match（html，“，
RegexOptions.Singleline）；//拼写错误
var url=match.Groups[1]。值；

此外，如果您想获取每个元素，可能需要使用Regex.Matches（…）
而不是Regex.Match（…）
，请使用如下代码：
string htmlCode = "<td><a href=\" www.aa.pl \"><td> <a href=\" www.cos.com \"><td>";
Regex r = new Regex( "<a href=\"(.*?)\">", RegexOptions.IgnoreCase );
MatchCollection mc = r.Matches(htmlCode);

foreach ( Match m1 in mc ) {                
   MessageBox.Show( m1.Groups[1].ToString() );
}

字符串htmlCode=”“；
Regex r=new Regex（“，RegexOptions.IgnoreCase）；
MatchCollection mc=r.Matches（htmlCode）；
foreach（匹配mc中的m1）{
Show（m1.Groups[1].ToString（））；
}
如果要获取每个元素，请使用如下代码：
string htmlCode = "<td><a href=\" www.aa.pl \"><td> <a href=\" www.cos.com \"><td>";
Regex r = new Regex( "<a href=\"(.*?)\">", RegexOptions.IgnoreCase );
MatchCollection mc = r.Matches(htmlCode);

foreach ( Match m1 in mc ) {                
   MessageBox.Show( m1.Groups[1].ToString() );
}

字符串htmlCode=”“；
Regex r=new Regex（“，RegexOptions.IgnoreCase）；
MatchCollection mc=r.Matches（htmlCode）；
foreach（匹配mc中的m1）{
Show（m1.Groups[1].ToString（））；
}
为什么不使用一个既好又快的HTML解析来解析它呢。
例如：
   string HTML = "<td><a href='http://www.google.com'>";

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(HTML);
            HtmlNodeCollection a = doc.DocumentNode.SelectNodes("//a[@href]");

            string url = a[0].GetAttributeValue("href", null);

            Console.WriteLine(url);
            Console.ReadLine();

string HTML=”“；
HtmlDocument doc=新的HtmlDocument（）；
doc.LoadHtml（HTML）；
HtmlNodeCollection a=doc.DocumentNode.SelectNodes（“//a[@href]”）；
字符串url=a[0]。GetAttributeValue（“href”，null）；
Console.WriteLine（url）；
Console.ReadLine（）；

您需要使用HtmlAgilityPack导入，
为什么不使用一个既好又快的HTML解析来解析它呢。
例如：
   string HTML = "<td><a href='http://www.google.com'>";

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(HTML);
            HtmlNodeCollection a = doc.DocumentNode.SelectNodes("//a[@href]");

            string url = a[0].GetAttributeValue("href", null);

            Console.WriteLine(url);
            Console.ReadLine();

string HTML=”“；
HtmlDocument doc=新的HtmlDocument（）；
doc.LoadHtml（HTML）；
HtmlNodeCollection a=doc.DocumentNode.SelectNodes（“//a[@href]”）；
字符串url=a[0]。GetAttributeValue（“href”，null）；
Console.WriteLine（url）；
Console.ReadLine（）；

您需要使用HtmlAgilityPack导入，
，因为组[0]记住与匹配的所有字符串。组[1]在（）之间剪切文本。如果您的正则表达式为：“”，则组[1]中的“href”将是“href”，组[2]中的“www-adresbecause”将是您的www地址，因为组[0]记住与匹配的所有字符串。组[1]在（）之间剪切文本。如果您的正则表达式为：“”，则组中[1]将为“href”，组中[2]将为您的www地址