Html String.IndexOf（）返回意外值-无法提取两个搜索字符串之间的子字符串_Html_Powershell_Substring_String Parsing

Html String.IndexOf（）返回意外值-无法提取两个搜索字符串之间的子字符串

html powershell

Html String.IndexOf（）返回意外值-无法提取两个搜索字符串之间的子字符串,html,powershell,substring,string-parsing,Html,Powershell,Substring,String Parsing,脚本来操纵网络故事中的一些专有名称，以帮助我的阅读工具正确发音我通过 $webpage = (Invoke-WebRequest -URI 'https://wanderinginn.com/2018/03/20/4-20-e/').Content 此$网页应为字符串类型现在返回意外值，我需要解释为什么或如何自己找到错误从理论上讲，它应该剪切页面的“主体”，通过我要替换的专有名词列表运行它，并将其推送到htm文件中。这一切都可行，但IndexOf（“Prev…”）的值却不行编辑：

脚本来操纵网络故事中的一些专有名称，以帮助我的阅读工具正确发音

我通过

$webpage = (Invoke-WebRequest -URI 'https://wanderinginn.com/2018/03/20/4-20-e/').Content

此$网页应为字符串类型

现在

返回意外值，我需要解释为什么或如何自己找到错误

从理论上讲，它应该剪切页面的“主体”，通过我要替换的专有名词列表运行它，并将其推送到htm文件中。这一切都可行，但IndexOf（“Prev…”）的值却不行

编辑：调用webrequest后，我可以

Set-Clipboard $webrequest

然后在记事本++中发布，在那里我可以找到'div class=“entry content”和'Previous Chapter'。如果我做了类似的事情

Set-Clipboard $webpage.substring(
     $webpage.IndexOf('<div class="entry-content">'),
     $webpage.IndexOf('PreviousChapter')
   )

设置剪贴板$webpage.substring(
$webpage.IndexOf（“”），
$webpage.IndexOf（'PreviousChapter'）
)

我希望Powershell能够正确地确定这些字符串的两个第一个实例，并在它们之间切换。因此，我的剪贴板现在应该有我想要的内容，但字符串比第一个实例更进一步。

tl；dr

您对如何工作有一个误解：第二个参数必须是要提取的子字符串的长度，而不是结束索引（字符位置）-请参见下文
作为一种替代方法，您可以使用更简洁（尽管更复杂）的正则表达式操作和
-replace
来提取单个操作中感兴趣的子字符串-见下文
总的来说，最好使用HTML解析器来提取所需的信息，因为字符串处理是脆弱的（HTML允许空格、引号样式等的变化）

正如所指出的，您对其工作原理有一个误解：其论点如下：

开始索引（
```
0
```
基于字符位置）
从中应返回给定长度的子字符串

相反，您尝试传递另一个索引作为长度参数
要解决此问题，必须从较高的索引中减去较低的索引，以获得要提取的子字符串的长度：
一个简化的例子：

# Sample input from which to extract the substring # '>>this up to here' # or, better, # 'this up to here'. $webpage = 'Return from >>this up to here<<' # WRONG (your attempt): # *index* of 2nd substring is mistakenly used as the *length* of the # substring to extract, which in this even *breaks*, because a length # that exceeds the bounds of the string is specified. $webpage.Substring( $webpage.IndexOf('>>'), $webpage.IndexOf('<<') ) # OK, extracts '>>this up to here' # The difference between the two indices is the correct length # of the substring to extract. $webpage.Substring( ($firstIndex = $webpage.IndexOf('>>')), $webpage.IndexOf('<<') - $firstIndex ) # BETTER, extracts 'this up to here' $startDelimiter = '>>' $endDelimiter = '<<' $webpage.Substring( ($firstIndex = $webpage.IndexOf($startDelimiter) + $startDelimiter.Length), $webpage.IndexOf($endDelimiter) - $firstIndex )

如果指定一个长度，其端点将超出字符串的边界（如果索引加上该长度生成的索引大于字符串的长度）

也就是说，您可以使用单个正则表达式（）通过以下方式提取感兴趣的子字符串：

内联选项（
（？…）
）
s
确保元字符
也匹配换行符（以便
*
跨行匹配），默认情况下不会匹配

请注意，如果搜索字符串恰好包含正则表达式元字符（在正则表达式上下文中具有特殊含义的字符），则可能必须将转义应用于要嵌入正则表达式中的搜索字符串：

对于嵌入的文字字符串，
\
-根据需要转义字符；e、例如，将
.txt
转义为
\.txt

如果要嵌入的字符串来自变量，请首先对其值应用
[regex]：：Escape（）
；e、 g:

$var = '.txt' # [regex]::Escape() yields '\.txt', which ensures # that '.txt' doesn't also match '_txt" 'a_txt a.txt' -replace ('a' + [regex]::Escape($var)), 'a.csv'

您获得了什么意想不到的价值？您期望的值是什么？为什么？这对我来说很好
（调用WebRequest-URI'https://wanderinginn.com/2018/03/20/4-20-e/”）.Content.indexof（“上一章”）
这让我得到了87859。这有什么不对？是否将行号视为字符号？
IndexOf（）
只返回请求字符串的整数索引。您需要使用该信息来剪切所需内容。
.SubString（）
方法单独使用
StartIndex
或
StartIndex
，Length`。您将为它提供两个起始索引号。//您需要将第二个数字设置为两个索引值之间的差值。天啊，我是个笨蛋。非常感谢。我想我让自己更难了，因为记事本++的发现总是显示不同的字符数与powershell。
# Sample input from which to extract the substring # '>>this up to here' # or, better, # 'this up to here'. $webpage = 'Return from >>this up to here<<' # WRONG (your attempt): # *index* of 2nd substring is mistakenly used as the *length* of the # substring to extract, which in this even *breaks*, because a length # that exceeds the bounds of the string is specified. $webpage.Substring( $webpage.IndexOf('>>'), $webpage.IndexOf('<<') ) # OK, extracts '>>this up to here' # The difference between the two indices is the correct length # of the substring to extract. $webpage.Substring( ($firstIndex = $webpage.IndexOf('>>')), $webpage.IndexOf('<<') - $firstIndex ) # BETTER, extracts 'this up to here' $startDelimiter = '>>' $endDelimiter = '<<' $webpage.Substring( ($firstIndex = $webpage.IndexOf($startDelimiter) + $startDelimiter.Length), $webpage.IndexOf($endDelimiter) - $firstIndex )

'abc'.Substring(4) # ERROR "startIndex cannot be larger than length of string"

'abc'.Substring(1, 3) # ERROR "Index and length must refer to a location within the string"

$webpage = 'Return from >>this up to here<<' # Outputs 'this up to here' $webpage -replace '^.*?>>(.*?)<<.*', '$1'

$webpage -replace '(?s).*?<div class="entry-content">(.*?)Previous Chapter.*', '$1'

$var = '.txt' # [regex]::Escape() yields '\.txt', which ensures # that '.txt' doesn't also match '_txt" 'a_txt a.txt' -replace ('a' + [regex]::Escape($var)), 'a.csv'