删除html/xml的最简单方法<;标签>;从单线输出

删除html/xml的最简单方法<;标签>;从单线输出,html,xml,sed,Html,Xml,Sed,我试图清理grep的输出,它看起来像: <words>Http://www.path.com/words</words> Http://www.path.com/words 我试过用 sed 's/<.*>//' sed's//' …删除标签,但这只会破坏整条线路。我不知道为什么会发生这种情况,因为每一个“”在进入内容之前 最简单的方法是什么 谢谢 为您的sed表达式尝试以下操作: sed 's/<.*>\(.*\)<\/.*>

我试图清理grep的输出,它看起来像:

<words>Http://www.path.com/words</words>
Http://www.path.com/words
我试过用

sed 's/<.*>//' 
sed's//'
…删除标签,但这只会破坏整条线路。我不知道为什么会发生这种情况,因为每一个“”在进入内容之前

最简单的方法是什么


谢谢

为您的sed表达式尝试以下操作:

sed 's/<.*>\(.*\)<\/.*>/\1/'
这给了我们:

parens like this
那到底是怎么回事?让我们把这个表达式分解一下看看

表达式细分:

<.*>   - Match the first tag
\(.*\) - Match and save the text between the tags   
<\/.*> - Match the end tag making sure to escape the / character  
\1     - Output the result of the first saved match 
       -   (the text that is matched between \( and \))
sed s/ - This is the opening tag to a sed expression.
.*     - Match any character to start (as well as nothing).
(      - Match a literal left parenthesis character.
\(.*\) - Match any character and save as a back-reference. In this case it will match anything between the first open and last close parenthesis in the expression.
)      - Match a literal right parenthesis character.
.*     - Same as above.
\1     - Match the first saved back-reference. In the case of our sample this is filled in with `parens`
\(.*\) - Same as above.
\1     - Same as above.
/      - End of the match expression. Signals transition to the output expression.
\1 \2  - Print our two back-references.
/      - End of output expression.

正如我们所看到的,从括号(
)之间获取的back引用被替换回匹配表达式,以便能够匹配字符串
parens

Grep默认为贪婪,这意味着它将获取第一个
和最后一个
之间的所有内容。好的,这很有效。不过,我真的不明白它在干什么。我看到它在逃避一些字符,但有人能解释一下发生了什么(特别是帕伦斯和数字1)?非常感谢@user115360我添加了一个表达式分解。这回答了你的问题吗?我喜欢增加解释。你已经回答了很多。谢谢使用此部分,
\(.*)
,您可以作出注释,说明它保存了文本。当你储蓄时,这在实践中意味着什么?我在想这可能是一个分组。但后来我意识到你两个都逃走了。那它到底在干什么?
sed s/ - This is the opening tag to a sed expression.
.*     - Match any character to start (as well as nothing).
(      - Match a literal left parenthesis character.
\(.*\) - Match any character and save as a back-reference. In this case it will match anything between the first open and last close parenthesis in the expression.
)      - Match a literal right parenthesis character.
.*     - Same as above.
\1     - Match the first saved back-reference. In the case of our sample this is filled in with `parens`
\(.*\) - Same as above.
\1     - Same as above.
/      - End of the match expression. Signals transition to the output expression.
\1 \2  - Print our two back-references.
/      - End of output expression.