PythonWebScraper每页返回多个列表,而不是遍历搜索结果页码?

PythonWebScraper每页返回多个列表,而不是遍历搜索结果页码?,python,json,web-scraping,pagination,duplicates,Python,Json,Web Scraping,Pagination,Duplicates,我在下面创建了一个web抓取机制,但是当运行它时,它会复制搜索结果页面上的列表-而且我也不知道如何迭代每个搜索结果页面,而不从第一个SRP获得完全相同的结果。你知道这里出了什么问题吗 url = '''https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?zip=32805&inventorySearchWidgetType=PRICE&maxPric

我在下面创建了一个web抓取机制,但是当运行它时,它会复制搜索结果页面上的列表-而且我也不知道如何迭代每个搜索结果页面,而不从第一个SRP获得完全相同的结果。你知道这里出了什么问题吗

url = '''https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?zip=32805&inventorySearchWidgetType=PRICE&maxPrice=42500&maxMileage=50000&showNegotiable=false&sortDir=DESC&sourceContext=carGurusHomePageModel&distance=100&minPrice=0&sortType=PRICE&minMileage=0&sellerTypes=PRIVATE'''
listing_detail_url = 'https://www.cargurus.com/Cars/detailListingJson.action?inventoryListing={}&searchZip=&searchDistance=500&inclusionType=DEFAULT'

import json
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get(url).text, 'html.parser')

data = []
for a in soup.select('a[href^="#listing"]'):  # get all listings on the page
    listing_id = a['href'].split('=')[-1]
    json_data = requests.get(listing_detail_url.format(listing_id)).json()   
    listing_title = json_data['listing']['listingTitle']
    vehicle_id = json_data['listing']['id']
    price = json_data['listing']['price']
    make_name = json_data['listing']['makeName']
    model_name = json_data['listing']['modelName']
    mileage = json_data['listing']['mileage']
    #vin_id = json_data['listing']['vin']
    # ... other data

    data.append( (listing_title, vehicle_id, price, make_name, model_name, mileage, vin_id) )

您也可以尝试使用此脚本从其他页面获取有关汽车的信息:

import requests


page_url = 'https://www.cargurus.com/Cars/searchResults.action?zip=32805&offset={}&maxResults=15&distance=500'

data = []
offset = 0
while True:
    print('Offset {}...'.format(offset))
    json_data = requests.get(page_url.format(offset)).json()

    for listing in json_data:
        listing_title = listing['listingTitle']
        vehicle_id = listing['id']
        price = listing['price']
        make_name = listing['makeName']
        model_name = listing['modelName']
        mileage = listing['mileage']
        # ... other data

        print((listing_title, vehicle_id, price, make_name, model_name, mileage))
        data.append( (listing_title, vehicle_id, price, make_name, model_name, mileage) )

    if len(json_data) != 15:
        break

    offset += 15
印刷品:

...

('2018 Honda CR-V EX AWD', 273663888, 20875.0, 'Honda', 'CR-V', 36870)
('2019 Ford Ranger Lariat SuperCrew RWD', 277554768, 29995.0, 'Ford', 'Ranger', 4546)
('2015 Ford Edge SEL', 273107810, 9999.0, 'Ford', 'Edge', 99336)
('2020 RAM 1500 Limited Crew Cab 4WD', 279568758, 54895.0, 'RAM', '1500', 1903)
('2014 Volkswagen Passat TDI SE', 268214566, 9498.0, 'Volkswagen', 'Passat', 45235)
Offset 105...
('2017 Chevrolet Silverado 1500 High Country Crew Cab RWD', 273586618, 36500.0, 'Chevrolet', 'Silverado 1500', 27936)
('2017 Volkswagen Tiguan S', 273485901, 12495.0, 'Volkswagen', 'Tiguan', 24824)
('2019 Ford Explorer Limited', 277039894, 30400.0, 'Ford', 'Explorer', 26328)
('2014 Dodge Challenger SXT RWD', 274612168, 10750.0, 'Dodge', 'Challenger', 105362)
('2012 Volkswagen GTI 2.0T 4-Door FWD with Sunroof and Navigation', 277629553, 7500.0, 'Volkswagen', 'GTI', 106911)
('2013 Buick LaCrosse Premium II FWD', 279206632, 4991.0, 'Buick', 'LaCrosse', 169886)
('2017 Toyota RAV4 XLE', 273207166, 17500.0, 'Toyota', 'RAV4', 27197)
('2017 Ford Explorer XLT', 273452570, 21899.0, 'Ford', 'Explorer', 26523)

...

您也可以尝试使用此脚本从其他页面获取有关汽车的信息:

import requests


page_url = 'https://www.cargurus.com/Cars/searchResults.action?zip=32805&offset={}&maxResults=15&distance=500'

data = []
offset = 0
while True:
    print('Offset {}...'.format(offset))
    json_data = requests.get(page_url.format(offset)).json()

    for listing in json_data:
        listing_title = listing['listingTitle']
        vehicle_id = listing['id']
        price = listing['price']
        make_name = listing['makeName']
        model_name = listing['modelName']
        mileage = listing['mileage']
        # ... other data

        print((listing_title, vehicle_id, price, make_name, model_name, mileage))
        data.append( (listing_title, vehicle_id, price, make_name, model_name, mileage) )

    if len(json_data) != 15:
        break

    offset += 15
印刷品:

...

('2018 Honda CR-V EX AWD', 273663888, 20875.0, 'Honda', 'CR-V', 36870)
('2019 Ford Ranger Lariat SuperCrew RWD', 277554768, 29995.0, 'Ford', 'Ranger', 4546)
('2015 Ford Edge SEL', 273107810, 9999.0, 'Ford', 'Edge', 99336)
('2020 RAM 1500 Limited Crew Cab 4WD', 279568758, 54895.0, 'RAM', '1500', 1903)
('2014 Volkswagen Passat TDI SE', 268214566, 9498.0, 'Volkswagen', 'Passat', 45235)
Offset 105...
('2017 Chevrolet Silverado 1500 High Country Crew Cab RWD', 273586618, 36500.0, 'Chevrolet', 'Silverado 1500', 27936)
('2017 Volkswagen Tiguan S', 273485901, 12495.0, 'Volkswagen', 'Tiguan', 24824)
('2019 Ford Explorer Limited', 277039894, 30400.0, 'Ford', 'Explorer', 26328)
('2014 Dodge Challenger SXT RWD', 274612168, 10750.0, 'Dodge', 'Challenger', 105362)
('2012 Volkswagen GTI 2.0T 4-Door FWD with Sunroof and Navigation', 277629553, 7500.0, 'Volkswagen', 'GTI', 106911)
('2013 Buick LaCrosse Premium II FWD', 279206632, 4991.0, 'Buick', 'LaCrosse', 169886)
('2017 Toyota RAV4 XLE', 273207166, 17500.0, 'Toyota', 'RAV4', 27197)
('2017 Ford Explorer XLT', 273452570, 21899.0, 'Ford', 'Explorer', 26523)

...

@格雷格:你不能一次追加乘法值。如果是一个元组你can@Greg不能一次附加乘法值。如果它是一个元组,你可以