Python 在beautifulsoup4中，当纯粹基于某个元素和其中的文本删除网站时，如何返回多个结果？_Python_Html_Web Scraping_Beautifulsoup

Python 在beautifulsoup4中，当纯粹基于某个元素和其中的文本删除网站时，如何返回多个结果？

python html web-scraping

Python 在beautifulsoup4中，当纯粹基于某个元素和其中的文本删除网站时，如何返回多个结果？,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我目前正在处理一个令人沮丧的项目，我第一次在这里发帖寻求帮助。简而言之，使用beautifulsoup4，我试图删除一个没有可靠HTML类或ID的网站。我所拥有的只是锚元素，例如我在下面提供的示例，我试图仅使用小写文本“Red Fern”抓住短语“Where the Red Fern growth”。总之，我试图识别并收集/打印每个未分类/未识别锚元素的文本，这些锚元素包含短语“红色蕨类植物生长的地方”，而不必键入整个字符串并保持大小写不敏感到目前为止，我已经尝试了很多事情，我最大的成功只有一

我目前正在处理一个令人沮丧的项目，我第一次在这里发帖寻求帮助。简而言之，使用beautifulsoup4，我试图删除一个没有可靠HTML类或ID的网站。我所拥有的只是锚元素，例如我在下面提供的示例，我试图仅使用小写文本“Red Fern”抓住短语“Where the Red Fern growth”。总之，我试图识别并收集/打印每个未分类/未识别锚元素的文本，这些锚元素包含短语“红色蕨类植物生长的地方”，而不必键入整个字符串并保持大小写不敏感

到目前为止，我已经尝试了很多事情，我最大的成功只有一半。我能够成功地收集第一个包含“WTRFG”的锚元素。不幸的是，尽管我尽了最大的努力，这几乎是我所能得到的。我使用了find和find_all，尝试使用re.search和regex，并尝试了在其他堆栈溢出答案中找到的一些其他东西。没有骰子。这是我现在得到的

import bs4
import requests
import re
import pretty_errors

url = "http://fake.site/search.php?req=where+the+red+fern+grows&lg_topic=fakesite&open=0&view=simple&res=25&phrase=1&column=def"
page = requests.get(url)
fernSoup = bs4.BeautifulSoup(page.content, "html.parser")
redFern = "red fern"

print(type(fernSoup))
print(type(redFern))

anchor = fernSoup.find_all("a", class_=False, text=lambda text: text and redFern in text.lower())

print(anchor)

其输出为：

<class 'bs4.BeautifulSoup'>
<class 'str'>
[<a href="book/index.php?md5=82C10FF9DA122C4B1061F83555F3800E" id="796869" title="">Where The Red Fern Grows</a>]

# This is only the first of three different results, but the only one I can access usually. The other two contain the exact same structure, minus differences in the href url and ID number.


[]
#这只是三个不同结果中的第一个，但我通常只能访问其中一个。另外两个包含完全相同的结构，减去href url和ID号的差异。

任何建议都将不胜感激，感谢您抽出时间阅读我的帖子

编辑：我试图访问的三个定位点，直接从打印结果复制粘贴（fernSoup）


红色蕨类植物生长的地方：两只狗和一个男孩的故事

您可以共享URL吗？可能文本是通过JavaScript注入的，而beautifulsoup没有看到它。打印（汤）并检查标签是否真的存在。我无法共享URL，但我可以在帖子中共享搜索结果。（评论太长）非常感谢Andrej，这是一个完美的解决方案！

<td width="500"><a href="book/index.php?md5=82C10FF9DA122C4B1061F83555F3800E" id="796869" title="">Where The Red Fern Grows</a></td>

<td width="500"><a href="book/index.php?md5=3C96145628CC4759595FB3C1A767673A" id="1157998" title="">Where the Red Fern Grows<br/> <font color="green" face="Times"><i>0553274295</i></font></a></td>

<td width="500"><a href="book/index.php?md5=9DD3079644E043E530682DA95C95B999" id="2413155" title="">Where the Red Fern Grows: The Story of Two Dogs and a Boy<br/> <font color="green" face="Times"><i>978-0-307-78156-7, 0307781569, 0553274295, 9780440412670</i></

from bs4 import BeautifulSoup

html_doc = """
 <td width="500"><a href="book/index.php?md5=82C10FF9DA122C4B1061F83555F3800E" id="796869" title="">Where The Red Fern Grows</a></td> <td width="500"><a href="book/index.php?md5=3C96145628CC4759595FB3C1A767673A" id="1157998" title="">Where the Red Fern Grows<br/> <font color="green" face="Times"><i>0553274295</i></font></a></td> 
"""

fernSoup = BeautifulSoup(html_doc, "html.parser")
redFern = "red fern"

anchor = fernSoup.find_all(
    lambda tag: tag.name == "a" and redFern in tag.text.lower()
)

print(anchor)

print(fernSoup.select('a:-soup-contains("Red Fern")'))