C#，如何在网站上使用正则表达式抓取_C#_Regex

C#，如何在网站上使用正则表达式抓取

c# regex

C#，如何在网站上使用正则表达式抓取,c#,regex,C#,Regex,单击我的按钮1时，它将运行此 MatchCollection matchCollection = new Regex(@"(?<=/>)\d+").Matches(new StreamReader(((HttpWebResponse)((HttpWebRequest)WebRequest.Create("http://www.proxyserverlist24.top/feeds/posts/default")).GetResponse()).GetResponseStre

单击我的按钮1时，它将运行此

 MatchCollection matchCollection = new Regex(@"(?<=/&gt;)\d+").Matches(new StreamReader(((HttpWebResponse)((HttpWebRequest)WebRequest.Create("http://www.proxyserverlist24.top/feeds/posts/default")).GetResponse()).GetResponseStream()).ReadToEnd());

MatchCollection MatchCollection=new Regex（@）（？不需要Regex。您可以使用xml解析器（您的链接返回xml）和html解析器（）来解析“content”标记的文本。因此，最后的代码是：
IPAddress tempip;
int port;
List<IPEndPoint> proxies = null;

using (var client = new HttpClient())
{
    var doc = new HtmlAgilityPack.HtmlDocument();
    XNamespace ns = "http://www.w3.org/2005/Atom";
    var xml = await client.GetStringAsync("http://www.proxyserverlist24.top/feeds/posts/default");
    var xDoc = XDocument.Parse(xml);
    proxies = xDoc.Descendants(ns + "entry")
        .Select(x => (string)x.Element(ns + "content"))
        .SelectMany(x =>
        {
            doc.LoadHtml(x);
            return doc.DocumentNode.SelectNodes("//span[not(span)]")
                        .SelectMany(n => n.Descendants())
                        .Select(n => n.InnerText.Split(":".ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
                        .Where(n => n.Length == 2)
                        .Where(n => IPAddress.TryParse(n[0], out tempip))
                        .Where(n => int.TryParse(n[1], out port))
                        .Select(n => new IPEndPoint(IPAddress.Parse(n[0]), int.Parse(n[1])));
        })
        .ToList();
}

IPAddress tempip；
国际港口；
列表代理=空；
使用（var client=new HttpClient（））
{
var doc=new HtmlAgilityPack.HtmlDocument（）；
XNS=”http://www.w3.org/2005/Atom";
var xml=await client.GetStringAsync（“http://www.proxyserverlist24.top/feeds/posts/default");
var xDoc=XDocument.Parse（xml）；
代理=xDoc.子代（ns+“条目”）
.Select（x=>（字符串）x.Element（ns+“内容”））
.SelectMany（x=>
{
doc.LoadHtml（x）；
返回doc.DocumentNode.SelectNodes（//span[非（span）]）
.SelectMany（n=>n.subjects（））
.Select（n=>n.InnerText.Split（“：”.tocharray（），StringSplitOptions.RemoveEmptyEntries））
.其中（n=>n.长度==2）
.Where（n=>IPAddress.TryParse（n[0]，out tempip））
.Where（n=>int.TryParse（n[1]，输出端口））
.Select（n=>newipendpoint（IPAddress.Parse（n[0]），int.Parse（n[1]））；
})
.ToList（）；
}

事实上，一个较短的正则表达式解决方案也是可能的，但使用正则表达式解析xml或html（如评论中所述）不是一个好主意。
不要使用正则表达式解析html。你不能做太多的研究，因为这是该站点上投票率最高的帖子之一。我的代码以前在另一个站点上工作过。由于该站点关闭，我不得不更改e regex valuesTry Noone正试图在这个问题上解析HTML。我想知道是否有人甚至费心查找解析的含义，或者至少阅读您引用的帖子。regex非常适合搜索文本，并根据模式检索或替换。regex是一个很好的解决问题的方法。我使用以这种方式使用正则表达式：newregex（（？）？