BeautifulSoup在Python中提取不带类的值

BeautifulSoup在Python中提取不带类的值,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我想在Python中使用BeautifulSoup提取数据 我的文件: <div class="listing-item" data-id="309531" data-score="0"> <div class="thumb"> <a href="https://res.cloudinary.com/"> <div style="background-image:url(https://res.cloudinary.com/d

我想在Python中使用BeautifulSoup提取数据

我的文件:

<div class="listing-item" data-id="309531" data-score="0">

  <div class="thumb">
    <a href="https://res.cloudinary.com/">

      <div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>
    </a>
  </div>
</div>

在这里,我想从中获取背景图像URL

<div style="background-image:url(https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2292,y_50/co_rgb:FFFFFF,l_text:oswald_100_bold_letter_spacing_5:01,y_-107/c_fit,w_200/abu-dhabi-plate_private-car_classic);"></div>

我的代码:

from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests

url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'

print('{:^50} {:^15} {:^25} '.format('Title', 'Pice', 'Date'))

for page in range(0, 40):   # <--- Increase to number pages you want
    response = requests.get(url.format(page))
    soup = BeautifulSoup(response.text, 'lxml')

    for title, price, date, thumb  in zip(soup.select('.listing-item .title'),
                            soup.select('.listing-item .price'),
                            soup.select('.listing-item .date'),
                            soup.select('.listing-item .thumb')):

        print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(), 50), price.get_text().strip(), thumb.get_text().strip()))

从textwrap导入
从bs4导入BeautifulSoup
从urllib.parse导入parse_qsl、urljoin、urlparse
导入请求
url='1〕https://uae.dubizzle.com/motors/number-plates/?page={}'
打印({:^50}{:^15}{:^25})。格式('Title','Pice','Date'))

对于范围(0,40)内的页面:#您可以通过在
拇指中搜索
值来访问url

您可以尝试以下方法:

代码:

from textwrap import shorten
from bs4 import BeautifulSoup
from urllib.parse import parse_qsl, urljoin, urlparse
import requests

url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'

print('{:^50} {:^15} {:^25} '.format('Title', 'Price', 'Date'))

for page in range(0, 1):   # <--- Increase to number pages you want
    response = requests.get(url.format(page))
    soup = BeautifulSoup(response.text, 'lxml')

    for title, price, date, thumb  in zip(soup.select('.listing-item .title'),soup.select('.listing-item .price'),soup.select('.listing-item .date'),soup.select('.listing-item .thumb')):
        # url = thumb.find('div').get('style').split('url(')[1].split(');')[0])
        print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(),50),price.get_text().strip(), thumb.find('div').get('style').split('url(')[1].split(');')[0]))
从textwrap导入
从bs4导入BeautifulSoup
从urllib.parse导入parse_qsl、urljoin、urlparse
导入请求
url='1〕https://uae.dubizzle.com/motors/number-plates/?page={}'
打印({:^50}{:^15}{:^25})。格式('Title','Price','Date'))

对于范围(0,1)中的页面:#您需要使用find_next('div')获取div元素,然后获取style属性。使用正则表达式获取图像Url

试试下面的代码

from textwrap import shorten
from bs4 import BeautifulSoup
import requests
import re

url = 'https://uae.dubizzle.com/motors/number-plates/?page={}'

print('{:^50} {:^15} {:^25} '.format('Title', 'Pice', 'Date'))

for page in range(0, 40):   # <--- Increase to number pages you want
    response = requests.get(url.format(page))
    soup = BeautifulSoup(response.text, 'lxml')

    for title, price, date, thumb  in zip(soup.select('.listing-item .title'),
                            soup.select('.listing-item .price'),
                            soup.select('.listing-item .date'),
                            soup.select('.listing-item .thumb')):


        print('{:50} {:<25} {:<15}'.format(shorten(title.get_text().strip(), 50), price.get_text().strip(), re.search("https?:\/\/[^\s]+[^);]", thumb.find_next("div")['style']).group(0)))

您可以使用
thumb.find('a')['href']
而不是
thumb.find('div').get('style').split('url(')[1]。split(');')[0]
@Shijith我想他想要图像链接。在这种情况下,
href
链接到另一个页面,图像存储在背景图像中
G91911 - Excellent for PORSCHE                     AED 59,000                https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:88887,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:J,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
R 199                                              AED 49,000                https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2122,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:M,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
88887 J                                            AED 49,000                https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:2212,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:S,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
M2122                                              AED 52,000                https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:22022,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:J,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
S 2212                                             AED 309,000               https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:5000,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_5:L,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_new
22022 J                                            AED 9,500                 https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:5945,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:H,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic
5000 L                                             AED 2,800,000             https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:90,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:Z,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic
Dubai                                              AED 760,000               https://res.cloudinary.com/dubizzle-com/image/upload/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:10000,x_100,y_-50/co_rgb:242424,l_text:oswald_140_bold_letter_spacing_4:H,x_-240,y_-50/c_fit,w_200/dubai-plate_private-car_classic