BeautifulSoup在Python中提取不带类的值
我想在Python中使用BeautifulSoup提取数据 我的文件:BeautifulSoup在Python中提取不带类的值,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我想在Python中使用BeautifulSoup提取数据 我的文件: <div class="listing-item" data-id="309531" data-score="0"> <div class="thumb"> <a href="https://res.cloudinary.com/"> <div style="background-image:url(https://res.cloudinary.com/d
<div class="listing-item" data-id="309531" data-score="0">
<div class="thumb">
<a href="https://res.cloudinary.com/">
<div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>
</a>
</div>
</div>
在这里,我想从中获取背景图像URL
<div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>
我的代码:
from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests
url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'
print('{:^50} {:^15} {:^25} '.format('Title', 'Pice', 'Date'))
for page in range(0, 40): # <--- Increase to number pages you want
response = requests.get(url.format(page))
soup = BeautifulSoup(response.text, 'lxml')
for title, price, date, thumb in zip(soup.select('.listing-item .title'),
soup.select('.listing-item .price'),
soup.select('.listing-item .date'),
soup.select('.listing-item .thumb')):
print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(), 50), price.get_text().strip(), thumb.get_text().strip()))
从textwrap导入
从bs4导入BeautifulSoup
从urllib.parse导入parse_qsl、urljoin、urlparse
导入请求
url='1〕https://uae.dubizzle.com/motors/number-plates/?page={}'
打印({:^50}{:^15}{:^25})。格式('Title','Pice','Date'))
对于范围(0,40)内的页面:#您可以通过在拇指中搜索值来访问url
您可以尝试以下方法:
代码:
from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests
url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'
print('{:^50} {:^15} {:^25} '.format('Title', 'Price', 'Date'))
for page in range(0, 1): # <--- Increase to number pages you want
response = requests.get(url.format(page))
soup = BeautifulSoup(response.text, 'lxml')
for title, price, date, thumb in zip(soup.select('.listing-item .title'),soup.select('.listing-item .price'),soup.select('.listing-item .date'),soup.select('.listing-item .thumb')):
# url = thumb.find('div').get('style').split('url(')[1].split(');')[0])
print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(),50),price.get_text().strip(), thumb.find('div').get('style').split('url(')[1].split(');')[0]))
从textwrap导入
从bs4导入BeautifulSoup
从urllib.parse导入parse_qsl、urljoin、urlparse
导入请求
url='1〕https://uae.dubizzle.com/motors/number-plates/?page={}'
打印({:^50}{:^15}{:^25})。格式('Title','Price','Date'))
对于范围(0,1)中的页面:#您需要使用find_next('div')获取div元素,然后获取style属性。使用正则表达式获取图像Url
试试下面的代码
from textwrap import shorten
from bs4 import BeautifulSoup
import requests
import re
url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'
print('{:^50} {:^15} {:^25} '.format('Title', 'Pice', 'Date'))
for page in range(0, 40): # <--- Increase to number pages you want
response = requests.get(url.format(page))
soup = BeautifulSoup(response.text, 'lxml')
for title, price, date, thumb in zip(soup.select('.listing-item .title'),
soup.select('.listing-item .price'),
soup.select('.listing-item .date'),
soup.select('.listing-item .thumb')):
print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(), 50), price.get_text().strip(), re.search("https?:\/\/[^\s]+[^);]", thumb.find_next("div")['style']).group(0)))
您可以使用thumb.find('a')['href']
而不是thumb.find('div').get('style').split('url(')[1]。split(');')[0]
@Shijith我想他想要图像链接。在这种情况下,href
链接到另一个页面,图像存储在背景图像中
G91911 - Excellent for PORSCHE AED 59,000 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:88887,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:J,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
R 199 AED 49,000 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2122,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:M,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
88887 J AED 49,000 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2212,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:S,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
M2122 AED 52,000 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:22022,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:J,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
S 2212 AED 309,000 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:5000,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:L,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
22022 J AED 9,500 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:5945,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:H,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic
5000 L AED 2,800,000 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:90,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:Z,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic
Dubai AED 760,000 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:10000,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:H,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic