C# 为Url编码的链接C匹配正则表达式#_C#_Regex_Pattern Matching_Urlencode

C# 为Url编码的链接C匹配正则表达式#

c# regex

C# 为Url编码的链接C匹配正则表达式#,c#,regex,pattern-matching,urlencode,C#,Regex,Pattern Matching,Urlencode,我有一个包含一些链接的XML文件 <SupportingDocs> <LinkedFile>http://llcorp/ll/lljomet.dll/open/864606</LinkedFile> <LinkedFile>http://llcorp/ll/lljomet.dll/open/1860632</LinkedFile> <LinkedFile>%20http%3A%2F%2Fllenglish%2Fll%2Fll

我有一个包含一些链接的XML文件

<SupportingDocs>
<LinkedFile>http://llcorp/ll/lljomet.dll/open/864606</LinkedFile>
<LinkedFile>http://llcorp/ll/lljomet.dll/open/1860632</LinkedFile>
<LinkedFile>%20http%3A%2F%2Fllenglish%2Fll%2Fll.exe%2Fopen%2F927515</LinkedFile>
<LinkedFile>%20http%3A%2F%2Fllenglish%2Fll%2Fll.exe%2Fopen%2F973783</LinkedFile>
</SupportingDocs>


http://llcorp/ll/lljomet.dll/open/864606
http://llcorp/ll/lljomet.dll/open/1860632
%20http%3A%2F%2FLENGLISH%2Fll%2Fll.exe%2FOUNT%2F927515
%20http%3A%2F%2flenglish%2Fll%2Fll.exe%2Fopen%2F973783

我使用正则表达式“\（？：https？：//www.）[^\]+\”并使用c#

var matches=MyParser.matches（FormXml）
但它匹配的是前两个链接，而不是编码的链接
如何使用正则表达式匹配URL编码的链接？
下面是一个可能有用的片段。我真的怀疑你是否使用了最好的方法，所以我做了一些假设（也许你只是没有给出足够的细节）
我将xml解析为XmlDocument
，以便在代码中使用它。相关标签（“LinkedFile”）被拉出。每个标记都被解析为一个Uri
。如果失败，它将被取消扫描，并再次尝试解析。最后是一个字符串列表，其中包含正确解析的URL。如果你真的需要，你可以在这个集合上使用你的正则表达式
//这是用于交互式控制台的
#r“System.Xml.Linq”
使用System.Xml；
使用System.Xml.Linq；
//样本数据，如文章中提供的。
字符串rawXml=”http://llcorp/ll/lljomet.dll/open/864606http://llcorp/ll/lljomet.dll/open/1860632%20http%3A%2F%2Fllenglish%2Fll%2Fll.exe%2Fopen%2F927515%20http%3A%2F%2Fllenglish%2Fll%2Fll.exe%2Fopen%2F973783";
var xdoc=new XmlDocument（）；
LoadXml（rawXml）
//将存储正确解析的URL
var foundUrls=新列表（）；
//用于解析URL的临时对象
Uri结果；
foreach（xdoc.GetElementsByTagName（“LinkedFile”）中的XmlElement节点）
{
var text=node.InnerText；
//第一次解析尝试
var result=Uri.TryCreate（text，UriKind.Absolute，out-uriResult）；
//任何有效的Uri都将在这里解析，因此只限于http和https协议
//看https://stackoverflow.com/a/7581824/1462295
if（result&（uriResult.Scheme==Uri.UriSchemeHttp | | uriResult.Scheme==Uri.UriSchemeHttps））
{
Add（uriResult.ToString（））；
}
其他的
{
//上面没有解析，所以请检查这是否是编码字符串。
//可能有前导/尾随空格，所以也要修复它
result=Uri.TryCreate（Uri.UnescapeDataString（text.Trim（），UriKind.Absolute，out-uriResult）；
//见上文评论
if（result&（uriResult.Scheme==Uri.UriSchemeHttp | | uriResult.Scheme==Uri.UriSchemeHttps））
{
Add（uriResult.ToString（））；
}
}
}
//交互式输出：
>查找URL
名单（4）{”http://llcorp/ll/lljomet.dll/open/864606", "http://llcorp/ll/lljomet.dll/open/1860632", "http://llenglish/ll/ll.exe/open/927515", "http://llenglish/ll/ll.exe/open/973783" }
您正在匹配https后的两个斜杠。在前两种情况下存在这些问题，但在第二种情况下不存在。可能还有其他问题，但这是我第一次看到。xml文件在不同的部分包含许多类型的URL。实际上，代码会获取所有匹配类型的URL，然后处理每种类型。但你的回答给了我一些思考的选择。谢谢