Python 在BeautifulSoup中，如何在另一个元素中搜索元素？_Python_Django_Python 3.x_Beautifulsoup

Python 在BeautifulSoup中，如何在另一个元素中搜索元素？

python django python-3.x

Python 在BeautifulSoup中，如何在另一个元素中搜索元素？,python,django,python-3.x,beautifulsoup,Python,Django,Python 3.x,Beautifulsoup,我正在使用Django 2、Python 3.7和BeautifulSoup 4。我有下面的代码，应该在元素中查找元素 req = urllib2.Request(fullurl, headers=settings.HDR) html = urllib2.urlopen(req, timeout=settings.SOCKET_TIMEOUT_IN_SECONDS).read() bs = BeautifulSoup(html, features="lxml") pattern = re.com

我正在使用Django 2、Python 3.7和BeautifulSoup 4。我有下面的代码，应该在元素中查找元素

req = urllib2.Request(fullurl, headers=settings.HDR)
html = urllib2.urlopen(req, timeout=settings.SOCKET_TIMEOUT_IN_SECONDS).read()
bs = BeautifulSoup(html, features="lxml")
pattern = re.compile(r'^submitted ')
posted_elt = bs.find(text=pattern)
author_elt = posted_elt.find("span", class_="author") if posted_elt is not None else None

然而这条线

author_elt = posted_elt.find("span", class_="author") if posted_elt is not None else None

正在抛出错误“TypeError:find（）不接受关键字参数”。在另一个元素中搜索元素的正确方法是什么？

使用

.find（）

方法搜索html标记。找到标记后，应使用

.find（）

结果的

.text

属性将结果转换为字符串。然后对该字符串使用正则表达式搜索

以下是示例用法：

from bs4 import BeautifulSoup
import requests
import re

res = requests.get("https://en.wikipedia.org/wiki/Dog")
soup = BeautifulSoup(res.content,"html.parser")
reg = re.compile("-")
s = soup.find("title").text
print(re.search(reg,s).group(0))

# If you want to find all html tags and search each of them use find_all()

all_res = soup.find_all("p")
reg = re.compile("dog")
for i in all_res:
    s = i.text
    match = re.search(reg,s)
    if match:
        print(match.group(0))

后面的示例将找到所有

标记，将它们转换为字符串并在其中搜索“dog”。

当您在BeautifulSoup中搜索文本时，您会得到一个

bs4.element.NavigableString

对象，它与普通python

str

非常相似。幸运的是，它有“可导航”的部分

navigableString.parent

引用可在下一次查找中使用的父元素。您没有尝试查找文本节点的子元素，因为文本节点没有子元素。您正在尝试查找包含此文本节点的元素，并从那里继续搜索

req = urllib2.Request(fullurl, headers=settings.HDR)
html = urllib2.urlopen(req, timeout=settings.SOCKET_TIMEOUT_IN_SECONDS).read()
bs = BeautifulSoup(html, features="lxml")
pattern = re.compile(r'^submitted ')
posted_elt = bs.find(text=pattern)
author_elt = posted_elt.parent.find("span", class_="author") if posted_elt is not None else None

始终将完整的错误消息（从单词“Traceback”开始）作为文本（而不是屏幕截图）进行讨论（不是评论）。还有其他有用的信息。如果您选中

type（posted\u elt）

它将是一些字符串类型，而不是元素。您已选择节点内的文本。

find

是

str.find

。不确定如何进行此选择…

find（text=pattern）

给出了

bs4.element.NavigableString

哪个是string@furas，我认为OP希望在任何元素下面找到他正在查找的文本。在XPATH中，我认为应该是类似于

“/*[以（text（），'submitted'）]/span[@class='author']]”

开头，但我还没有测试过它。@furas-是的，我认为如果导入

lxml

，然后使用它的beautifulsou，就可以得到XPATH。但我不太记得了。