如何在Python 3中删除字符串中的特殊字符？_Python_String

如何在Python 3中删除字符串中的特殊字符？

python string

如何在Python 3中删除字符串中的特殊字符？,python,string,Python,String,我想转换由此 Charming boutique selling trendy casual &amp; dressy apparel for women, some plus sized items, swimwear, shoes &amp; jewelry.</b&

我想转换

由此

&lt;b&gt;&lt;i&gt;&lt;u&gt;Charming boutique selling trendy casual &amp;amp; dressy apparel for women, some plus sized items, swimwear, shoes &amp;amp; jewelry.&lt;/u&gt;&lt;/i&gt;&lt;/b&gt;

对此

Charming boutique selling trendy casual dressy apparel for women, some plus sized items, swimwear, shoes jewelry.

我很困惑如何不仅删除特殊字符，而且删除特殊字符之间的一些字母。有人能提出这样做的建议吗？

尝试以下方法：

import re

string = '&lt;b&gt;&lt;i&gt;&lt;u&gt;Charming boutique selling trendy casual &amp;amp; dressy apparel for women, some plus sized items, swimwear, shoes &amp;amp; jewelry.&lt;/u&gt;&lt;/i&gt;&lt;/b&gt;'

string = re.sub('&lt;/?[a-z]+&gt;', '', string)
string = string.replace('&amp;amp;', '&')

print(string)  # prints 'Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.'

您要更改的字符串看起来像是经过多次转义的HTML，因此我的解决方案仅适用于此类情况

我使用regex将标记替换为空字符串，还将符号的转义替换为文字

希望这就是您正在寻找的，如果您有任何问题，请告诉我。

您可以使用

html

模块和

BeautifulSoup

来获取没有转义标记的文本：

s = "&lt;b&gt;&lt;i&gt;&lt;u&gt;Charming boutique selling trendy casual &amp;amp; dressy apparel for women, some plus sized items, swimwear, shoes &amp;amp; jewelry.&lt;/u&gt;&lt;/i&gt;&lt;/b&gt;"

from bs4 import BeautifulSoup
from html import unescape

soup = BeautifulSoup(unescape(s), 'lxml')
print(soup.text)

印刷品：

Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.

当我查看源代码页视图时，字符串显示为转义字符，但当我在命令行上打印出来时，它显示为这样，

Charming poutile saling trendy casual&；女式考究服装，一些加大尺寸的物品，泳装，鞋和；珠宝。

您的解决方案会删除所有标签，但

$amp不会被删除。我尝试使用replace
函数，但效果不太好。仅供参考，我正在尝试将字符串放入HTML中的meta标记中。@Jay您可以通过调用soup.text.replace（'&'，''）
在我的代码中删除'&'。