Python 如何在BeautifulSoup中找到字符串_Python_Screen Scraping

Python 如何在BeautifulSoup中找到字符串

python

Python 如何在BeautifulSoup中找到字符串,python,screen-scraping,Python,Screen Scraping,如果我只知道bs4中兄弟姐妹头衔的一半，我怎么能勉强过关呢 from urllib.request import urlopen from bs4 import BeautifulSoup as BS from urllib import request import pandas as pd import os import re html = request.urlopen(https://en.wikipedia.org/wiki/Charles_Ehresmann) bs = BS(ht

如果我只知道bs4中兄弟姐妹头衔的一半，我怎么能勉强过关呢

from urllib.request import urlopen
from bs4 import BeautifulSoup as BS
from urllib import request
import pandas as pd
import os
import re
html = request.urlopen(https://en.wikipedia.org/wiki/Charles_Ehresmann)
bs = BS(html.read(), 'html.parser')
    
data = pd.DataFrame({''known for':[],)}
    try:
        name = bs.find('h1').text
    except:
        name = ''
    try:
        known = bs.find('th',string = 'Known.*').next_element.text #?
    except:
        known = ''

感谢您的想法

您应该将

re

模块整合进来。我假设，您的意思是根据

th

的

文本

内容进行查找

import re
# ...other code 
known = bs.find('th', text = 're.compile(r'Known\..+')

它将查找任何包含

已知。

和一些附加字符的内容

如果需要包含所有字符（包括新行），则应将

re.DOTALL

参数添加到

re.compile

，如下所示：

known = bs.find('th', text = 're.compile(r'Known\..{0,10}', re.DOTALL)

这样，它将查找多行文本段。但在这种情况下，您需要限制

Known.

之后的尾随字符数，如上面的示例所示。因此，它将在已知的

字符串后查找不超过10个字符。

字符串。

您可以使用：contains和next\u sibling

from bs4 import BeautifulSoup as bs
import requests

r = requests.get('https://en.wikipedia.org/wiki/Charles_Ehresmann')
soup = bs(r.text, 'lxml')
print(soup.select_one('th:contains("Known")').next_sibling.get_text('\n').split('\n'))

不作为列表：

print(soup.select_one('th:contains("Known")').next_sibling.get_text('\n'))