Python-针对目标的Web抓取

Python-针对目标的Web抓取,python,selenium,selenium-webdriver,web-scraping,Python,Selenium,Selenium Webdriver,Web Scraping,我正试图从这个目标市场获取芯片名称,并试图在第一页自动获取所有28个芯片。我写了这段代码。打开链接,向下滚动(获取名称和图片)并尝试获取名称 import time from selenium import webdriver from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.common.keys import Keys from webdriver_manager.

我正试图从这个目标市场获取芯片名称,并试图在第一页自动获取所有28个芯片。我写了这段代码。打开链接,向下滚动(获取名称和图片)并尝试获取名称

import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager as CM

options = webdriver.ChromeOptions()
options.add_argument("--log-level=3")

mobile_emulation = {
    "userAgent": 'Mozilla/5.0 (Linux; Android 4.0.3; HTC One X Build/IML74K) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/83.0.1025.133 Mobile Safari/535.19'
}
options.add_experimental_option("mobileEmulation", mobile_emulation)

bot = webdriver.Chrome(executable_path=CM().install(), options=options)

bot.get('https://www.target.com/c/chips-snacks-grocery/-/N-5xsy7')
bot.set_window_size(500, 950)
time.sleep(5)

for i in range(0,3):
    ActionChains(bot).send_keys(Keys.END).perform()
    time.sleep(1)

product_names = bot.find_elements_by_class_name('Link-sc-1khjl8b-0 styles__StyledTitleLink-mkgs8k-5 kdCHb inccCG h-display-block h-text-bold h-text-bs flex-grow-one')

hrefList = []
for e in product_names:
    hrefList.append(e.get_attribute('href'))

for href in hrefList:
    print(href)
当我从浏览器中检查名称时,所有芯片的公共部分都有
Link-sc-1khjl8b-0样式uu StyledTitleLink-mkgs8k-5 kdCHb inccCG h-display-block h-text-bold h-text-bs flex grow one
类名。如您所见,我添加了
find_elements\u by_class\u name('Link-sc-1khjl8b-0 styles\uuu styledtitleink-mkgs8k-5 kdCHb inccCG h-display-block h-text-bold h-text-bs flex grow one')
行。但它给出的结果为空。怎么了?你能帮助我吗?解决方案可以是
selenium
bs4
无所谓。

试试看

product_names = bot.find_elements_by_css_selector('Link-sc-1khjl8b-0.styles__StyledTitleLink-mkgs8k-5.kdCHb.inccCG.h-display-block.h-text-bold.h-text-bs.flex-grow-one')
使用
find\u elements\u by\u class\u name()
时,未正确处理类名中的空格


除了选择器不适用于我之外,我需要使用
”.Link-sc-1khjl8b-0.ItemLink-sc-1eyz3ng-0.kdCHb.dtKueh'

只要输入正确的键,就可以从api获取所有数据

import requests


url = 'https://redsky.target.com/redsky_aggregations/v1/web/plp_search_v1'
payload = {
'key': 'ff457966e64d5e877fdbad070f276d18ecec4a01',
'category': '5xsy7',
'channel': 'WEB',
'count': '28',
'default_purchasability_filter': 'true',
'include_sponsored': 'true',
'offset': '0',
'page': '/c/5xsy7',
'platform': 'desktop',
'pricing_store_id': '1771',
'scheduled_delivery_store_id': '1771',
'store_ids': '1771,1768,1113,3374,1792',
'useragent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
'visitor_id': '0179C80AE1090201B5D5C1D895ADEA6C'}

jsonData = requests.get(url, params=payload).json()    


for each in jsonData['data']['search']['products']:
    title = each['item']['product_description']['title']
    buy_url = each['item']['enrichment']['buy_url']
    image_url = each['item']['enrichment']['images']['primary_image_url']        
    print(title)
输出:

Ruffles Cheddar & Sour Cream Potato Chips - 2.5oz
Doritos 3D Crunch Chili Cheese Nacho - 6oz
Hippeas Vegan White Cheddar Organic Chickpea Puffs - 5oz
PopCorners Spicy Queso - 7oz
Doritos 3D Crunch Spicy Ranch - 6oz
Pringles Snack Stacks Variety Pack Potato Crisps Chips - 12.9oz/18ct
Frito-Lay Variety Pack Flavor Mix - 18ct
Doritos Nacho Cheese Chips - 9.75oz
Hippeas Nacho Vibes Organic Chickpea Puffs - 5oz
Tostitos Scoops Tortilla Chips -10oz
Ripple Potato Chips Party Size - 13.5oz - Market Pantry™
Ritz Crisp & Thins Cream Cheese & Onion Potato And Wheat Chips - 7.1oz
Pringles Sour Cream & Onion Potato Crisps Chips - 5.5oz
Original Potato Chips Party Size - 15.25oz - Market Pantry™
Organic White Corn Tortilla Chips - 12oz - Good & Gather™
Sensible Portions Sea Salt Garden Veggie Straws - 7oz
Traditional Kettle Chips - 8oz - Good & Gather™
Lay's Classic Potato Chips - 8oz
Cheetos Crunchy Flamin Hot - 8.5oz
Sweet Potato Kettle Chips - 7oz - Good & Gather™
SunChips Harvest Cheddar Flavored Wholegrain Snacks - 7oz
Frito-Lay Variety Pack Classic Mix - 18ct
Doritos Cool Ranch Chips - 10.5oz
Lay's Wavy Original Potato Chips - 7.75oz
Frito-Lay Variety Pack Family Fun Mix - 18ct
Cheetos Jumbo Puffs - 8.5oz
Frito-Lay Fun Times Mix Variety Pack - 28ct
Doritos Nacho Cheese Flavored Tortilla Chips - 15.5oz
Lay's Barbecue Flavored Potato Chips - 7.75oz
SunChips Garden Salsa Flavored Wholegrain Snacks - 7oz
Pringles Snack Stacks Variety Pack Potato Crisps Chips - 12.9oz/18ct
Frito-Lay Variety Pack Doritos & Cheetos Mix - 18ct
这也适用于:

product_names = bot.find_elements_by_xpath("//li[@data-test='list-entry-product-card']")

hrefList = []
for e in product_names:
    print(e.find_element_by_css_selector("a").get_attribute("href"))

谢谢你的回答。我的问题是,沃尔玛有自己的api吗?这就是为什么要使用
https://redsky.target.com/redsky_aggregations/v1/web/plp_search_v1
?看起来像是这样:
https://www.walmart.com/search/api/wpa
我和“target”混在一起对不起:)我是说,targeta是的。目标有一个api。是的,这就是为什么我使用redsky Address的原因。一个问题,如果我想在第二页中获得芯片,我应该在有效载荷中更改什么?谢谢你的回答,但我想问你如何获得芯片名称?这显示了带有
e.find_element_by_css_selector(“a.kdCHb”)。get_attribute('text')
的项urlsWIth。如果与“''.Link-sc-1khjl8b-0.ItemLink-sc-1eyz3ng-0.kdCHb.dtKueh'`一起使用,则会显示奇怪的链接。我还需要你的名字