C# 如何从左字符串和右字符串中解析字符串

C# 如何从左字符串和右字符串中解析字符串,c#,parsing,C#,Parsing,我当前正在尝试解析网页以获取特定字符串: 这是我用来加载网页的代码: using (HttpClient http = new HttpClient()) { var response = await http.GetStringAsync(pagelink); Console.WriteLine(response); HtmlDocument pageDocument = new HtmlDocument(); pageDocum

我当前正在尝试解析网页以获取特定字符串:

这是我用来加载网页的代码:

using (HttpClient http = new HttpClient())
{               
    var response = await http.GetStringAsync(pagelink);
    Console.WriteLine(response);
    HtmlDocument pageDocument = new HtmlDocument();
    pageDocument.LoadHtml(response);

    var token = pageDocument.DocumentNode.SelectSingleNode("").InnerText;
    Console.WriteLine(token);
}
问题是,我需要从前面提到的字符串中仅获取令牌: 610f15bd-0e23-4ac5-90c3-c0829ad8024e

我想应该有一个方法来实现这一点,但即使使用Xpath,我也无法成功。 所以我想知道是否有任何方法可以从框架字符串解析它,例如:

左字符串:
requestSecurityToken=
右字符串:

比用正则表达式捣乱容易多了

我觉得没那么难

var regex = @"\b[a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12}\b";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Value);
如果您只想取出紧跟在
requestSecurityToken=
之后的Guid,您可以:

var regex = @"requestSecurityToken=([a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12})";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Groups[1].Value);
比用正则表达式捣乱容易多了

我觉得没那么难

var regex = @"\b[a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12}\b";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Value);
如果您只想取出紧跟在
requestSecurityToken=
之后的Guid,您可以:

var regex = @"requestSecurityToken=([a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12})";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Groups[1].Value);

试着这样做:

string html = @"<script type=""text/javascript"" src=""./interceptor/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-3.4.1-ver-220AFD743D9E9643852E31A135A9F3AE.js?requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e""></script>";

// use something to extract value of the src attribute
// I'll use XDocument, but it is not good for HTML documents
XDocument xdoc = XDocument.Parse( html );
string src = xdoc.Root.Attribute("src")?.Value;

if (src is null) throw new Exception();

string[] splitted = src.Split("?");
string queryString = splitted[1]; //"requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e"

// using System.Collections.Specialized;
NameValueCollection parsed = HttpUtility.ParseQueryString( queryString );

Console.WriteLine(parsed["requestSecurityToken"]);
stringhtml=@”;
//使用一些东西来提取src属性的值
//我将使用XDocument,但它不适合HTML文档
XDocument xdoc=XDocument.Parse(html);
字符串src=xdoc.Root.Attribute(“src”)?.Value;
如果(src为null),则抛出新异常();
string[]splitted=src.Split(“?”);
字符串queryString=splitted[1]//“requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e”
//使用System.Collections.Specialized;
NameValueCollection parsed=HttpUtility.ParseQueryString(queryString);
WriteLine(已解析[“requestSecurityToken”]);

尝试以下方法:

string html = @"<script type=""text/javascript"" src=""./interceptor/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-3.4.1-ver-220AFD743D9E9643852E31A135A9F3AE.js?requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e""></script>";

// use something to extract value of the src attribute
// I'll use XDocument, but it is not good for HTML documents
XDocument xdoc = XDocument.Parse( html );
string src = xdoc.Root.Attribute("src")?.Value;

if (src is null) throw new Exception();

string[] splitted = src.Split("?");
string queryString = splitted[1]; //"requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e"

// using System.Collections.Specialized;
NameValueCollection parsed = HttpUtility.ParseQueryString( queryString );

Console.WriteLine(parsed["requestSecurityToken"]);
stringhtml=@”;
//使用一些东西来提取src属性的值
//我将使用XDocument,但它不适合HTML文档
XDocument xdoc=XDocument.Parse(html);
字符串src=xdoc.Root.Attribute(“src”)?.Value;
如果(src为null),则抛出新异常();
string[]splitted=src.Split(“?”);
字符串queryString=splitted[1]//“requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e”
//使用System.Collections.Specialized;
NameValueCollection parsed=HttpUtility.ParseQueryString(queryString);
WriteLine(已解析[“requestSecurityToken”]);

不带正则表达式或字符串拆分的My take:

// as already noted, XElement or XDocument may not be the best choice for handling Html
var xe = XElement.Parse(response);

// XPath will make sure you are looking at the right script element
var src = xe.XPathSelectElement("//script[contains(@src, 'requestSecurityToken')]").Attribute("src").Value;

// since relative uri don't support parsing its query, you need to stick in a pseudo base uri
Uri srcuri = new Uri(new Uri("http://localhost"), src);

// finally get the value by name
string token = System.Web.HttpUtility.ParseQueryString(srcuri.Query).Get("requestSecurityToken");

不带正则表达式或字符串拆分的我的take:

// as already noted, XElement or XDocument may not be the best choice for handling Html
var xe = XElement.Parse(response);

// XPath will make sure you are looking at the right script element
var src = xe.XPathSelectElement("//script[contains(@src, 'requestSecurityToken')]").Attribute("src").Value;

// since relative uri don't support parsing its query, you need to stick in a pseudo base uri
Uri srcuri = new Uri(new Uri("http://localhost"), src);

// finally get the value by name
string token = System.Web.HttpUtility.ParseQueryString(srcuri.Query).Get("requestSecurityToken");

从本质上看,似乎是重复的,我将其分为两部分:提取
src
的属性值,并将其视为
Uri
(它就是这样)。我想这比用regex.Ty@CaiusJard来捣乱要容易得多,但是我不理解这里使用的regex方法,像这样?var token=pageDocument.DocumentNode.SelectSingleNode(“/html/head/script[1]”)。GetDataAttribute(“src”)@FilburtSeems本质上是一个复制品,我将其分为两部分:提取
src
的属性值,并将其视为
Uri
(它就是这样)。我想这比用regex.Ty@CaiusJard来捣乱要容易得多,但是我不理解这里使用的regex方法,像这样?var token=pageDocument.DocumentNode.SelectSingleNode(“/html/head/script[1]”)。GetDataAttribute(“src”)@Filburt可以找到任何guid,但是如果你需要找到一个特定的实例,并且可以简单地处理一个Uri到底是什么,为什么不这样做呢?你可以处理Uri到底是什么,但是你仍然必须从整个html中提取它,如果你要这样做(使用正则表达式?:))你也可以把你真正想要的东西拉出来,这样可以找到任何guid,但是如果你需要找到一个特定的实例,并且可以简单地处理一个Uri,为什么不这样做呢?你可以按它是什么来处理Uri,但你仍然必须从整个html中拉出来,如果你要这样做的话(用正则表达式?:))你不妨拿出你真正想要的东西