Python BeautifulSoup按字符串查找标记，不包含子文本_Python_Beautifulsoup

Python BeautifulSoup按字符串查找标记，不包含子文本

python

Python BeautifulSoup按字符串查找标记，不包含子文本,python,beautifulsoup,Python,Beautifulsoup,我正在使用Python3和beautifulsoup4.4.0从网站中提取数据。我对div标记中的表很感兴趣，但是要知道表中有什么数据，我必须得到h4标记的文本，然后得到表中的同级。问题是其中一个h4标记有一个span，当其中有另一个标记时，BeautifulSoup对字符串值返回None def get_table_items(self, soup, header_title): header = soup.find('h4', string=re.compile(r'\b{}

我正在使用Python3和beautifulsoup4.4.0从网站中提取数据。我对div标记中的表很感兴趣，但是要知道表中有什么数据，我必须得到h4标记的文本，然后得到表中的同级。问题是其中一个h4标记有一个span，当其中有另一个标记时，BeautifulSoup对字符串值返回None

def get_table_items(self, soup, header_title):
        header = soup.find('h4', string=re.compile(r'\b{}\b'.format(header_title), re.I))
        header_table = header.find_next_sibling('table')
        items = header_table.find_all('td')
        return items

上述代码适用于除

唯一标题2（）之外的所有h4。

。。。。
唯一标题1
...
唯一标题2（）
...
唯一标题3
...

您可能需要手动执行搜索，而不是依赖正则表达式：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
header_title = "Unique Title 2"

for h4 in soup.find_all('h4'):
    if header_title in h4.text:
        ...

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
header_title = "Unique Title 2"

for h4 in soup.find_all('h4'):
    if header_title in h4.text:
        ...