Python：使用bs4提取某些值_Python_Web Scraping_Beautifulsoup

Python：使用bs4提取某些值

python web-scraping

Python：使用bs4提取某些值,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,HTML: jj可以从我抓取的网站中提取所有价值，但我只需要从中提取一些价值。例如，我只需要从GENERAL中提取座椅数量和去年产量的值，从TRANSMISSION中提取1档的值结果应该是： for item2 in soup2.find_all(attrs={'class':'col-7'}): jj=item2.text 您需要的信息只是标题“座位数”、“去年生产量”和“1档”的下一项，因此您可以使用zip 5, available, 5,00:1 然后d将包含您需要的所有信息

HTML:

jj可以从我抓取的网站中提取所有价值，但我只需要从中提取一些价值。例如，我只需要从GENERAL中提取座椅数量和去年产量的值，从TRANSMISSION中提取1档的值

结果应该是：

for item2 in soup2.find_all(attrs={'class':'col-7'}):
    jj=item2.text

您需要的信息只是标题“座位数”、“去年生产量”和“1档”的下一项，因此您可以使用

zip

5, available, 5,00:1

然后

将包含您需要的所有信息

更改find\u values元组以从html文本中获取值

all_items = soup.find_all(attrs={'class':'col-6'})
titles = [
    "number of seats", 
    "last year of production", 
    "1st gear"
]
d = {title: [] for title in titles}

for item, next_item in zip(all_items, all_items[1:]):
    for title in titles:
        if title in item.text:
            d[title].append(next_item.text)
            break

从bs4导入BeautifulSoup soup=BeautifulSoup（html，'html.parser'）查找_值=（‘座椅数量’、‘去年生产’、‘1档’）对于汤中的i.find_all（attrs={'class'：'row box'}）：对于i.find_all（'dt'）中的j： text=j.get_text（）.lower（）.strip（）如果text.startswith（查找值）：打印（text，j.find_next_sibling（'dd'）。get_text（））

你有没有试图实际解决手头的问题？你的代码似乎不是特别相关。@物理学家，它是相关的，我已经做了其他部分，这是我需要解决的最后一部分。谢谢

all_items = soup.find_all(attrs={'class':'col-6'})
titles = [
    "number of seats", 
    "last year of production", 
    "1st gear"
]
d = {title: [] for title in titles}

for item, next_item in zip(all_items, all_items[1:]):
    for title in titles:
        if title in item.text:
            d[title].append(next_item.text)
            break

from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') find_values = ('number of seats', 'last year of production', '1st gear') for i in soup.find_all(attrs={'class': 'row box'}): for j in i.find_all('dt'): text = j.get_text().lower().strip() if text.startswith(find_values): print(text, j.find_next_sibling('dd').get_text())