删除html/xml的最简单方法<;标签>;从单线输出
我试图清理grep的输出,它看起来像:删除html/xml的最简单方法<;标签>;从单线输出,html,xml,sed,Html,Xml,Sed,我试图清理grep的输出,它看起来像: <words>Http://www.path.com/words</words> Http://www.path.com/words 我试过用 sed 's/<.*>//' sed's//' …删除标签,但这只会破坏整条线路。我不知道为什么会发生这种情况,因为每一个“”在进入内容之前 最简单的方法是什么 谢谢 为您的sed表达式尝试以下操作: sed 's/<.*>\(.*\)<\/.*>
<words>Http://www.path.com/words</words>
Http://www.path.com/words
我试过用
sed 's/<.*>//'
sed's//'
…删除标签,但这只会破坏整条线路。我不知道为什么会发生这种情况,因为每一个“”在进入内容之前
最简单的方法是什么
谢谢 为您的sed表达式尝试以下操作:
sed 's/<.*>\(.*\)<\/.*>/\1/'
这给了我们:
parens like this
那到底是怎么回事?让我们把这个表达式分解一下看看
表达式细分:
<.*> - Match the first tag
\(.*\) - Match and save the text between the tags
<\/.*> - Match the end tag making sure to escape the / character
\1 - Output the result of the first saved match
- (the text that is matched between \( and \))
sed s/ - This is the opening tag to a sed expression.
.* - Match any character to start (as well as nothing).
( - Match a literal left parenthesis character.
\(.*\) - Match any character and save as a back-reference. In this case it will match anything between the first open and last close parenthesis in the expression.
) - Match a literal right parenthesis character.
.* - Same as above.
\1 - Match the first saved back-reference. In the case of our sample this is filled in with `parens`
\(.*\) - Same as above.
\1 - Same as above.
/ - End of the match expression. Signals transition to the output expression.
\1 \2 - Print our two back-references.
/ - End of output expression.
正如我们所看到的,从括号(
(
和)
)之间获取的back引用被替换回匹配表达式,以便能够匹配字符串parens
Grep默认为贪婪,这意味着它将获取第一个
和最后一个之间的所有内容。好的,这很有效。不过,我真的不明白它在干什么。我看到它在逃避一些字符,但有人能解释一下发生了什么(特别是帕伦斯和数字1)?非常感谢@user115360我添加了一个表达式分解。这回答了你的问题吗?我喜欢增加解释。你已经回答了很多。谢谢使用此部分,\(.*)
,您可以作出注释,说明它保存了文本。当你储蓄时,这在实践中意味着什么?我在想这可能是一个分组。但后来我意识到你两个都逃走了。那它到底在干什么?
sed s/ - This is the opening tag to a sed expression.
.* - Match any character to start (as well as nothing).
( - Match a literal left parenthesis character.
\(.*\) - Match any character and save as a back-reference. In this case it will match anything between the first open and last close parenthesis in the expression.
) - Match a literal right parenthesis character.
.* - Same as above.
\1 - Match the first saved back-reference. In the case of our sample this is filled in with `parens`
\(.*\) - Same as above.
\1 - Same as above.
/ - End of the match expression. Signals transition to the output expression.
\1 \2 - Print our two back-references.
/ - End of output expression.