Python 如何使用soup从页面中提取列中的数据
试图捕获项目符号中存在的数据 链接 这里需要使用xpath提取数据 要提取的数据Python 如何使用soup从页面中提取列中的数据,python,beautifulsoup,request,python-requests,Python,Beautifulsoup,Request,Python Requests,试图捕获项目符号中存在的数据 链接 这里需要使用xpath提取数据 要提取的数据 4 Door Sedan 4 Cylinder, 1.8 Litre Constantly Variable Transmission, Front Wheel Drive Petrol - Unleaded ULP 6.4 L/100km 我试过这个: import requests import lxml.html as lh import pandas a
4 Door Sedan
4 Cylinder, 1.8 Litre
Constantly Variable Transmission, Front Wheel Drive
Petrol - Unleaded ULP
6.4 L/100km
我试过这个:
import requests
import lxml.html as lh
import pandas as pd
import html
from lxml import html
from bs4 import BeautifulSoup
import requests
cars = []
urls = ['https://www.redbook.com.au/cars/details/2019-honda-civic-50-years-edition-auto-my19/SPOT-ITM-524208/']
for url in urls:
car_data={}
headers = {'User-Agent':'Mozilla/5.0'}
page = (requests.get(url, headers=headers))
tree = html.fromstring(page.content)
if tree.xpath('/html/body/div[1]/div[2]/div/div[1]/div[1]/div[4]/div/div'):
car_data["namings"] = tree.xpath('/html/body/div[1]/div[2]/div/div[1]/div[1]/div[4]/div/div')[0]
- -返回元素的集合
- -Python的内置函数用于删除字符串中的所有前导空格和尾随空格
import requests
from bs4 import BeautifulSoup
cars = []
urls = ['https://www.redbook.com.au/cars/details/2019-honda-civic-50-years-edition-auto-my19/SPOT-ITM-524208/']
for url in urls:
car_data=[]
headers = {'User-Agent':'Mozilla/5.0'}
page = (requests.get(url, headers=headers))
soup = BeautifulSoup(page.content,'lxml')
car_obj = soup.find("div",{'class':'r-center-pane'}).find("div",\
{'class':'micro-spec'}).find("div",{'class':'columns'}).find_all("dd")
for x in car_obj:
text = x.text.strip()
if text != "":
car_data.append(text)
cars.append(car_data)
print(cars)
O/p:
[['4 Door Sedan', '4 Cylinder, 1.8 Litre', 'Constantly Variable Transmission,
Front Wheel Drive', 'Petrol - Unleaded ULP', '6.4 L/100km']]
您已经导入了BeautifulSoup,为什么不使用css类选择器呢
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.redbook.com.au/cars/details/2019-honda-civic-50-years-edition-auto-my19/SPOT-ITM-524208/', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
info = [i.text.strip() for i in soup.select('.dgi-')]
您也可以打印为
for i in soup.select('.dgi-'):
print(i.text.strip())
如果我需要将输出分成5个部分?。如门=4门轿车,车身=4缸,1.8升输出为部分输出。这是一个列表。我添加了一个编辑,这样您就可以看到如何在不理解列表的情况下打印。如何将列表中的每个元素分配给变量。它试图为汤中的i.select('.dgi-')]car_数据['0']=info.split(“”[0]car_数据['1']=info.split(“”[1]car_数据['2']=info.split(“”[2]car_数据['3']=info.split(“”[3]