C# 如何从左字符串和右字符串中解析字符串
我当前正在尝试解析网页以获取特定字符串:C# 如何从左字符串和右字符串中解析字符串,c#,parsing,C#,Parsing,我当前正在尝试解析网页以获取特定字符串: 这是我用来加载网页的代码: using (HttpClient http = new HttpClient()) { var response = await http.GetStringAsync(pagelink); Console.WriteLine(response); HtmlDocument pageDocument = new HtmlDocument(); pageDocum
这是我用来加载网页的代码:
using (HttpClient http = new HttpClient())
{
var response = await http.GetStringAsync(pagelink);
Console.WriteLine(response);
HtmlDocument pageDocument = new HtmlDocument();
pageDocument.LoadHtml(response);
var token = pageDocument.DocumentNode.SelectSingleNode("").InnerText;
Console.WriteLine(token);
}
问题是,我需要从前面提到的字符串中仅获取令牌:
610f15bd-0e23-4ac5-90c3-c0829ad8024e
我想应该有一个方法来实现这一点,但即使使用Xpath,我也无法成功。
所以我想知道是否有任何方法可以从框架字符串解析它,例如:
左字符串:requestSecurityToken=
右字符串:
比用正则表达式捣乱容易多了
我觉得没那么难
var regex = @"\b[a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12}\b";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Value);
如果您只想取出紧跟在requestSecurityToken=
之后的Guid,您可以:
var regex = @"requestSecurityToken=([a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12})";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Groups[1].Value);
比用正则表达式捣乱容易多了
我觉得没那么难
var regex = @"\b[a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12}\b";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Value);
如果您只想取出紧跟在requestSecurityToken=
之后的Guid,您可以:
var regex = @"requestSecurityToken=([a-f0-9]{8}(?:-[a-f0-9]{4}){3}-[a-f0-9]{12})";
var m = Regex.Match(html, regex);
Console.WriteLine(m.Groups[1].Value);
试着这样做:
string html = @"<script type=""text/javascript"" src=""./interceptor/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-3.4.1-ver-220AFD743D9E9643852E31A135A9F3AE.js?requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e""></script>";
// use something to extract value of the src attribute
// I'll use XDocument, but it is not good for HTML documents
XDocument xdoc = XDocument.Parse( html );
string src = xdoc.Root.Attribute("src")?.Value;
if (src is null) throw new Exception();
string[] splitted = src.Split("?");
string queryString = splitted[1]; //"requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e"
// using System.Collections.Specialized;
NameValueCollection parsed = HttpUtility.ParseQueryString( queryString );
Console.WriteLine(parsed["requestSecurityToken"]);
stringhtml=@”;
//使用一些东西来提取src属性的值
//我将使用XDocument,但它不适合HTML文档
XDocument xdoc=XDocument.Parse(html);
字符串src=xdoc.Root.Attribute(“src”)?.Value;
如果(src为null),则抛出新异常();
string[]splitted=src.Split(“?”);
字符串queryString=splitted[1]//“requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e”
//使用System.Collections.Specialized;
NameValueCollection parsed=HttpUtility.ParseQueryString(queryString);
WriteLine(已解析[“requestSecurityToken”]);
尝试以下方法:
string html = @"<script type=""text/javascript"" src=""./interceptor/resource/org.apache.wicket.resource.JQueryResourceReference/jquery/jquery-3.4.1-ver-220AFD743D9E9643852E31A135A9F3AE.js?requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e""></script>";
// use something to extract value of the src attribute
// I'll use XDocument, but it is not good for HTML documents
XDocument xdoc = XDocument.Parse( html );
string src = xdoc.Root.Attribute("src")?.Value;
if (src is null) throw new Exception();
string[] splitted = src.Split("?");
string queryString = splitted[1]; //"requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e"
// using System.Collections.Specialized;
NameValueCollection parsed = HttpUtility.ParseQueryString( queryString );
Console.WriteLine(parsed["requestSecurityToken"]);
stringhtml=@”;
//使用一些东西来提取src属性的值
//我将使用XDocument,但它不适合HTML文档
XDocument xdoc=XDocument.Parse(html);
字符串src=xdoc.Root.Attribute(“src”)?.Value;
如果(src为null),则抛出新异常();
string[]splitted=src.Split(“?”);
字符串queryString=splitted[1]//“requestSecurityToken=610f15bd-0e23-4ac5-90c3-c0829ad8024e”
//使用System.Collections.Specialized;
NameValueCollection parsed=HttpUtility.ParseQueryString(queryString);
WriteLine(已解析[“requestSecurityToken”]);
不带正则表达式或字符串拆分的My take:
// as already noted, XElement or XDocument may not be the best choice for handling Html
var xe = XElement.Parse(response);
// XPath will make sure you are looking at the right script element
var src = xe.XPathSelectElement("//script[contains(@src, 'requestSecurityToken')]").Attribute("src").Value;
// since relative uri don't support parsing its query, you need to stick in a pseudo base uri
Uri srcuri = new Uri(new Uri("http://localhost"), src);
// finally get the value by name
string token = System.Web.HttpUtility.ParseQueryString(srcuri.Query).Get("requestSecurityToken");
不带正则表达式或字符串拆分的我的take:
// as already noted, XElement or XDocument may not be the best choice for handling Html
var xe = XElement.Parse(response);
// XPath will make sure you are looking at the right script element
var src = xe.XPathSelectElement("//script[contains(@src, 'requestSecurityToken')]").Attribute("src").Value;
// since relative uri don't support parsing its query, you need to stick in a pseudo base uri
Uri srcuri = new Uri(new Uri("http://localhost"), src);
// finally get the value by name
string token = System.Web.HttpUtility.ParseQueryString(srcuri.Query).Get("requestSecurityToken");
从本质上看,似乎是重复的,我将其分为两部分:提取
src
的属性值,并将其视为Uri
(它就是这样)。我想这比用regex.Ty@CaiusJard来捣乱要容易得多,但是我不理解这里使用的regex方法,像这样?var token=pageDocument.DocumentNode.SelectSingleNode(“/html/head/script[1]”)。GetDataAttribute(“src”)@FilburtSeems本质上是一个复制品,我将其分为两部分:提取src
的属性值,并将其视为Uri
(它就是这样)。我想这比用regex.Ty@CaiusJard来捣乱要容易得多,但是我不理解这里使用的regex方法,像这样?var token=pageDocument.DocumentNode.SelectSingleNode(“/html/head/script[1]”)。GetDataAttribute(“src”)@Filburt可以找到任何guid,但是如果你需要找到一个特定的实例,并且可以简单地处理一个Uri到底是什么,为什么不这样做呢?你可以处理Uri到底是什么,但是你仍然必须从整个html中提取它,如果你要这样做(使用正则表达式?:))你也可以把你真正想要的东西拉出来,这样可以找到任何guid,但是如果你需要找到一个特定的实例,并且可以简单地处理一个Uri,为什么不这样做呢?你可以按它是什么来处理Uri,但你仍然必须从整个html中拉出来,如果你要这样做的话(用正则表达式?:))你不妨拿出你真正想要的东西