Python BeautifulSoup4找不到“;a「;通过搜索文本进行标记
示例HTMLPython BeautifulSoup4找不到“;a「;通过搜索文本进行标记,python,beautifulsoup,Python,Beautifulsoup,示例HTML <a class="accordion-item__link" href="/identity-checking/individual"><!-- react-text: 178 -->Australia<!-- /react-text --></a> 它什么也不返回 如果我跑 soup.find(“a”,href=“/identity checking/individual”)它找到标签。 soup.find(“a”,href=“
<a class="accordion-item__link" href="/identity-checking/individual"><!-- react-text: 178 -->Australia<!-- /react-text --></a>
它什么也不返回
如果我跑
soup.find(“a”,href=“/identity checking/individual”)
它找到标签。soup.find(“a”,href=“/identity checking/individual”)。text
还返回“Australia”
是否与注释有关?在找到标记后尝试提取文本,即:
result = ""
for tag in soup.find_all('a'):
if tag.text == "Australia":
result = tag
出于某种原因,当存在
xml
注释时,检测标记文本会被翻转
您可以将此用作解决方法:
[ele for ele in soup('a') if ele.text == 'Australia']
我试图找到一种坚持
find
方法的方法,因为它是最方便、适应性最强的方法。这里的问题是HTML注释搞乱了引擎。手动删除注释会很有帮助
from bs4 import BeautifulSoup, Comment
bs = BeautifulSoup(
"""
<a class="accordion-item__link" href="/identity-checking/individual"><!-- react-text: 178 -->Australia<!-- /react-text --></a>
""",
"lxml"
)
# find all HTML comments and remove
comments = bs.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
r = bs.find('a', text='Australia')
print(r)
# <a class="accordion-item__link" href="/identity-checking/individual">Australia</a>
来自bs4导入美化组的,注释
bs=美联(
"""
""",
“lxml”
)
#查找所有HTML注释并删除
comments=bs.findAll(text=lambda text:isinstance(text,Comment))
[comment.extract()用于注释中的注释]
r=bs.find('a',text='Australia')
印刷品(r)
#
删除注释的方法来自这里
如果要保留这些评论,你可以制作一份soup
from bs4 import BeautifulSoup, Comment
bs = BeautifulSoup(
"""
<a class="accordion-item__link" href="/identity-checking/individual"><!-- react-text: 178 -->Australia<!-- /react-text --></a>
""",
"lxml"
)
# find all HTML comments and remove
comments = bs.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
r = bs.find('a', text='Australia')
print(r)
# <a class="accordion-item__link" href="/identity-checking/individual">Australia</a>