如何使用sed删除html标记

如何使用sed删除html标记,sed,Sed,输入为: <h1>This is heading 1</h1> <h2>This is heading 2</h2> <h3>This is heading 3</h3> <h4>This is heading 4</h4> <h5>This is heading 5</h5> <h6>This is heading 6</h6> </body

输入为:

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

</body>
</html>
我尝试了sed-n的//].*>//gp'example.html 但在屏幕上什么也看不到,似乎正则表达式不正确

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

</body>
</html>
sed -n 's/<[^>]*>//gp' test.csv | sed '/^$/d'
你就快到了,小点。您使用的可以匹配>字符,因此将其从命令中删除

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

</body>
</html>
pipe后面的命令是清除所有空行,如果您的版本支持PCRE的-p选项,grep应该足够了

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

</body>
</html>
$ grep -oP '(?<=>)(.[^<]+)(?=<)' file
This is heading 1
This is heading 2
This is heading 3
This is heading 4
This is heading 5
This is heading 6
做你的样品

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

</body>
</html>
sed -n 's|</\{0,1\}h[0-9]>||gp' YourFile
更换任何和在线,如果有修改,打印该行

<h1>This is heading 1</h1>
<h2>This is heading 2</h2>
<h3>This is heading 3</h3>
<h4>This is heading 4</h4>
<h5>This is heading 5</h5>
<h6>This is heading 6</h6>

</body>
</html>
更确切地说,假设标签
sed -n 's|^[[:space:]]*<\(h[0-9]>\)\(.*\)</\1|\2|p' YourFile

可能重复的堆栈溢出不是代码编写服务。请出示你的密码。因为堆栈溢出对您隐藏了关闭的原因:寻求调试帮助的问题为什么这段代码不起作用?必须包括所需的行为、特定的问题或错误以及在问题本身中重现这些问题所需的最短代码。没有明确问题陈述的问题对其他读者没有用处。请参阅:。