Python 删除特定代码块中的所有转义序列
我有一个HTML代码段,如下所示:Python 删除特定代码块中的所有转义序列,python,regex,Python,Regex,我有一个HTML代码段,如下所示: <code class="inline">\n object.__getattribute__\n </code>\n and\n <code class="inline">\n super.__getattribute__\n </code>\n peek\nin the\n <code class="inline">\n __dict__
<code class="inline">\n object.__getattribute__\n </code>\n and\n <code class="inline">\n super.__getattribute__\n </code>\n peek\nin the\n <code class="inline">\n __dict__\n </code>\n of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n <code class="inline">\n __getdescriptor__\n </code>\n method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n \n super\n </a>\n\n \n </a>\n object.\n </p>\n<p>\n That is, the MRO walking loop in\n
\n object.\uuu getattribute\uuuu\ n
\n和\n\n super.\uu\uu getattribute\uuu\ n
\n peek\n在查找属性时,在MRO上的类的\n\n。此PEP向元类添加了一个可选的\n\n\u\u getdescriptor\u\n
\n方法,该方法替换了此行为,并提供了对属性查找的更多控制,尤其是在使用\n\n超级\n\n\n\n\n\n对象时。\n\n也就是说,MRO遍历循环\n
问题
如何仅针对
标记中的\n
我尝试过的
我曾尝试使用re.sub()
方法,但我一直在替换所有内容,而不仅仅是\n
标记既然输入是HTML,为什么不使用专门的工具-HTML解析器呢
下面是一个示例,介绍如何使用以下方法查找所有code
标记并用空字符串替换\n
:
从bs4导入美化组
data=“”\n object.\uuu getattribute\uuuuu\ n
\n和\n\n super.\uu\u getattribute\uuuu\ n
\n peek\n在查找属性时,在MRO上的类的\n\n目录中添加可选的\n\n\uuuu getdescriptor\uuuu\n
\n方法,用于替换此行为并对属性查找提供更多控制,尤其是在使用\n\n超级\n\n\n\n\n对象时。\n\n\n即\n中的MRO遍历循环”
soup=BeautifulSoup(数据,“html.parser”)
对于汤中的代码(“代码”):
code.string=code.string.replace(“\n”,”)
印花(汤)
text='\n对象。\uuuu getattribute\uuu\n
\n和\n\n super.\uu\u\uu getattribute\uu\n“>\n\uuuu dict\uuu\n
\n在查找属性时,类在MRO上的类数。此PEP向元类添加了一个可选的\n\n\u\u getdescriptor\u\n
\n方法,该方法替换了此行为,并提供了对属性查找的更多控制,特别是在使用\n\n超级\n\n\n\n\n\n对象时。\n\n也就是说,在\n
打印(text.replace('\n','')
你的问题很不准确。用什么替换所有转义序列?示例输出可能可以。对不起,我的意思是删除,而不是替换。是否只删除
和,或者删除两者之间的所有内容?注意:OP询问的是如何删除\n
内部的..
。。
from bs4 import BeautifulSoup
data = """<code class="inline">\n object.__getattribute__\n </code>\n and\n <code class="inline">\n super.__getattribute__\n </code>\n peek\nin the\n <code class="inline">\n __dict__\n </code>\n of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n <code class="inline">\n __getdescriptor__\n </code>\n method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n \n super\n </a>\n\n \n </a>\n object.\n </p>\n<p>\n That is, the MRO walking loop in\n"""
soup = BeautifulSoup(data, "html.parser")
for code in soup("code"):
code.string = code.string.replace("\n", "")
print(soup)
text = '<code class="inline">\n object.__getattribute__\n </code>\n and\n <code class="inline">\n super.__getattribute__\n </code>\n peek\nin the\n <code class="inline">\n __dict__\n </code>\n of classes on the MRO for a class when looking for\nan attribute. This PEP adds an optional\n <code class="inline">\n __getdescriptor__\n </code>\n method to\na metaclass that replaces this behavior and gives more control over attribute\nlookup, especially when using a\n \n super\n </a>\n\n \n </a>\n object.\n </p>\n<p>\n That is, the MRO walking loop in\n '
print(text.replace('\n',''))