C# 如何从字符串解析网页链接？_C#_.net

C# 如何从字符串解析网页链接？

c# .net

C# 如何从字符串解析网页链接？,c#,.net,C#,.net,我正在尝试分析/获取此字符串中的链接： Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase); string rawString = link; foreach (Match m in linkParser.Matches(rawString))

我正在尝试分析/获取此字符串中的链接：

Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
                    string rawString = link;
                    foreach (Match m in linkParser.Matches(rawString))
                    {
                        string links = m.Value;
                    }

但我在字符串链接中得到的是：

http://rotter.net/cgi-bin/forum/dcboard.cgi?az=read_count&om=112190&forum=scoops1

最后还有剩余的>尝试将\S+更改为[^\\>]+

最后一个字符串：\b？：https？：\/\/| www\.[^\\>]+\b

但这不仅仅是找到工作链接。如果你的链接是这样的，它会找到www.a]+

顺便说一句：我认为在？：https？:

这样做的原因是，您告诉他查找所有非空白字符，并且应该以字母结尾。因为这个表达式是贪婪的，所以它会吃掉尽可能多的非空白字符。和>不是空白字符，而TAB是空白字符。[^\]+告诉他在找到一个新的角色之前获取所有角色。找到一个后，他会停下来。

我发现使用HTML并不是一种常规语言，就像有些人说的那样。下载后，考虑到这是源中唯一包含此文本的节点：

http://rotter.net/cgi-bin/forum/dcboard.cgi?az=read_count&om=112190&forum=scoops1"><b

您可以从任何地方获取源代码，可以通过webclient下载，也可以通过本地文件下载。这将返回：http://rotter.net/cgi-bin/forum/dcboard.cgi?az=read_count&om=112190&forum=scoops1

在锚定中还是从文本中？如果前者或使用类似Html Agility PackPlease的Propper解析器，请检查这一点以了解另一个类似问题。请不要一次又一次地转发同一个问题，如果确实需要，最好编辑并重新打开，但对于其他问题，最佳解决方案不是使用正则表达式，而是使用更合适的工具，例如，仅仅因为HTML不是常规语言，它就可能是

http://rotter.net/cgi-bin/forum/dcboard.cgi?az=read_count&om=112190&forum=scoops1"><b

            HtmlAgilityPack.HtmlDocument hp = new HtmlAgilityPack.HtmlDocument();
            string source = File.ReadAllText( @"C:\Users\Admin\Desktop\source.txt" );
            hp.LoadHtml(source);
            var node = hp.DocumentNode.SelectSingleNode("//a[contains(@href, 'http://rotter.net/cgi-bin/forum/dcboard.cgi?az=read_count&om=112190&forum=scoops1')]");
            string found = node.Attributes["href"].Value;                        
            Console.WriteLine(found);