C# 如何从左字符串和右字符串中解析字符串_C#_Parsing

C# 如何从左字符串和右字符串中解析字符串

c# parsing

C# 如何从左字符串和右字符串中解析字符串,c#,parsing,C#,Parsing,我当前正在尝试解析网页以获取特定字符串：这是我用来加载网页的代码： using (HttpClient http = new HttpClient()) { var response = await http.GetStringAsync(pagelink); Console.WriteLine(response); HtmlDocument pageDocument = new HtmlDocument(); pageDocum

我当前正在尝试解析网页以获取特定字符串：

这是我用来加载网页的代码：

using (HttpClient http = new HttpClient())
{               
    var response = await http.GetStringAsync(pagelink);
    Console.WriteLine(response);
    HtmlDocument pageDocument = new HtmlDocument();
    pageDocument.LoadHtml(response);

    var token = pageDocument.DocumentNode.SelectSingleNode("").InnerText;
    Console.WriteLine(token);
}

问题是，我需要从前面提到的字符串中仅获取令牌： 610f15bd-0e23-4ac5-90c3-c0829ad8024e

我想应该有一个方法来实现这一点，但即使使用Xpath，我也无法成功。所以我想知道是否有任何方法可以从框架字符串解析它，例如：

左字符串：

requestSecurityToken=

右字符串：

比用正则表达式捣乱容易多了

我觉得没那么难

var regex = @"\b[a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12}\b";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Value);

如果您只想取出紧跟在

requestSecurityToken=

之后的Guid，您可以：

var regex = @"requestSecurityToken=([a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12})";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Groups[1].Value);

比用正则表达式捣乱容易多了

我觉得没那么难

var regex = @"\b[a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12}\b";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Value);

如果您只想取出紧跟在

requestSecurityToken=

之后的Guid，您可以：

var regex = @"requestSecurityToken=([a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12})";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Groups[1].Value);

试着这样做：

string html = @"<script type=""text/javascript"" src=""./interceptor/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-3.4.1-ver-220AFD743D9E9643852E31A135A9F3AE.js?requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e""></script>";

// use something to extract value of the src attribute
// I'll use XDocument, but it is not good for HTML documents
XDocument xdoc = XDocument.Parse( html );
string src = xdoc.Root.Attribute("src")?.Value;

if (src is null) throw new Exception();

string[] splitted = src.Split("?");
string queryString = splitted[1]; //"requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e"

// using System.Collections.Specialized;
NameValueCollection parsed = HttpUtility.ParseQueryString( queryString );

Console.WriteLine(parsed["requestSecurityToken"]);

stringhtml=@”；
//使用一些东西来提取src属性的值
//我将使用XDocument，但它不适合HTML文档
XDocument xdoc=XDocument.Parse（html）；
字符串src=xdoc.Root.Attribute（“src”）？.Value；
如果（src为null），则抛出新异常（）；
string[]splitted=src.Split（“？”）；
字符串queryString=splitted[1]//“requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e”
//使用System.Collections.Specialized；
NameValueCollection parsed=HttpUtility.ParseQueryString（queryString）；
WriteLine（已解析[“requestSecurityToken”]）；

尝试以下方法：

string html = @"<script type=""text/javascript"" src=""./interceptor/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-3.4.1-ver-220AFD743D9E9643852E31A135A9F3AE.js?requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e""></script>";

// use something to extract value of the src attribute
// I'll use XDocument, but it is not good for HTML documents
XDocument xdoc = XDocument.Parse( html );
string src = xdoc.Root.Attribute("src")?.Value;

if (src is null) throw new Exception();

string[] splitted = src.Split("?");
string queryString = splitted[1]; //"requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e"

// using System.Collections.Specialized;
NameValueCollection parsed = HttpUtility.ParseQueryString( queryString );

Console.WriteLine(parsed["requestSecurityToken"]);

stringhtml=@”；
//使用一些东西来提取src属性的值
//我将使用XDocument，但它不适合HTML文档
XDocument xdoc=XDocument.Parse（html）；
字符串src=xdoc.Root.Attribute（“src”）？.Value；
如果（src为null），则抛出新异常（）；
string[]splitted=src.Split（“？”）；
字符串queryString=splitted[1]//“requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e”
//使用System.Collections.Specialized；
NameValueCollection parsed=HttpUtility.ParseQueryString（queryString）；
WriteLine（已解析[“requestSecurityToken”]）；

不带正则表达式或字符串拆分的My take：

// as already noted, XElement or XDocument may not be the best choice for handling Html
var xe = XElement.Parse(response);

// XPath will make sure you are looking at the right script element
var src = xe.XPathSelectElement("//script[contains(@src, 'requestSecurityToken')]").Attribute("src").Value;

// since relative uri don't support parsing its query, you need to stick in a pseudo base uri
Uri srcuri = new Uri(new Uri("http://localhost"), src);

// finally get the value by name
string token = System.Web.HttpUtility.ParseQueryString(srcuri.Query).Get("requestSecurityToken");

不带正则表达式或字符串拆分的我的take：

// as already noted, XElement or XDocument may not be the best choice for handling Html
var xe = XElement.Parse(response);

// XPath will make sure you are looking at the right script element
var src = xe.XPathSelectElement("//script[contains(@src, 'requestSecurityToken')]").Attribute("src").Value;

// since relative uri don't support parsing its query, you need to stick in a pseudo base uri
Uri srcuri = new Uri(new Uri("http://localhost"), src);

// finally get the value by name
string token = System.Web.HttpUtility.ParseQueryString(srcuri.Query).Get("requestSecurityToken");

从本质上看，似乎是重复的，我将其分为两部分：提取

src

的属性值，并将其视为

Uri

src

的属性值，并将其视为

Uri

（它就是这样）。我想这比用regex.Ty@CaiusJard来捣乱要容易得多，但是我不理解这里使用的regex方法，像这样？var token=pageDocument.DocumentNode.SelectSingleNode（“/html/head/script[1]”）。GetDataAttribute（“src”）@Filburt可以找到任何guid，但是如果你需要找到一个特定的实例，并且可以简单地处理一个Uri到底是什么，为什么不这样做呢？你可以处理Uri到底是什么，但是你仍然必须从整个html中提取它，如果你要这样做（使用正则表达式？：））你也可以把你真正想要的东西拉出来，这样可以找到任何guid，但是如果你需要找到一个特定的实例，并且可以简单地处理一个Uri，为什么不这样做呢？你可以按它是什么来处理Uri，但你仍然必须从整个html中拉出来，如果你要这样做的话（用正则表达式？：））你不妨拿出你真正想要的东西