Web scraping 为什么木偶人似乎在随机化数据?
我正试图抓取一个网站,但抓取器似乎在随机抽取我得到的数据。有时它会给我所有我要求的数据,有时不会。在我的价格评估中,它有时会给出正确的数据,但有时返回未定义的数据Web scraping 为什么木偶人似乎在随机化数据?,web-scraping,puppeteer,Web Scraping,Puppeteer,我正试图抓取一个网站,但抓取器似乎在随机抽取我得到的数据。有时它会给我所有我要求的数据,有时不会。在我的价格评估中,它有时会给出正确的数据,但有时返回未定义的数据 import puppeteer from "puppeteer" import useAddFirestore from "../hooks/useAddFirestore.js" export default async function nikeScraper(d
import puppeteer from "puppeteer"
import useAddFirestore from "../hooks/useAddFirestore.js"
export default async function nikeScraper(date){
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.setDefaultNavigationTimeout(0);
await page.goto("https://www.nike.com/w/sale-shoes-3yaepzy7ok");
const nikeData = []
const titles = await page.evaluate(() => {
const titles = document.querySelectorAll(".product-card__title")
const titleList = [...titles]
const text = titleList.map(title => title.innerText)
return text
})
titles.forEach((el, i) => {
nikeData[i] = {}
nikeData[i].title = el
nikeData[i].date = date
nikeData[i].brand = "Nike"
})
const links = await page.evaluate(() => {
const links = document.querySelectorAll(".product-card__img-link-overlay")
const linksList = [...links]
const href = linksList.map(link => link.href)
return href
})
links.forEach((el, i) => {
nikeData[i].link = el
})
const prices = await page.evaluate(() => {
const prices = document.querySelectorAll(".product-price__wrapper")
const priceList = [...prices]
const text = priceList.map(price => price.innerText)
return text
})
prices.forEach((el, i) => {
const splitEl = el.split("\n")
nikeData[i].sale = splitEl[0]
nikeData[i].retail = splitEl[1]
})
const images = await page.evaluate(() => {
const images = document.querySelectorAll("img")
const imageList = [...images]
const src = imageList.map(img => img.src).filter(src => src.includes("static.nike.com"))
return src
})
images.forEach((el, i) => {
nikeData[i].image = el
})
await browser.close();
for(let entry of nikeData){
useAddFirestore(entry)
}
}
我为另一个网站做了几乎相同的刮板,它每次都能工作,所以我不知道为什么它不工作
示例数据返回
{
title: 'ZX 2K 4D SHOES',
brand: 'Adidas',
image: 'https://assets.adidas.com/images/w_385,h_385,f_auto,q_auto:sensitive,fl_lossy/d071967e4a624b11a32eabb300e7a801_9366/zx-2k-4d-shoes.jpg',
sale: '',
retail: undefined
}