Python Web垃圾Aliexpress-懒散加载

Python Web垃圾Aliexpress-懒散加载,python,selenium,beautifulsoup,Python,Selenium,Beautifulsoup,我正在尝试使用Selenium和Python对Aliexpress进行web抓取。我是按照youtube的教程来做的,我遵循了每一个步骤,但我似乎无法让它工作 我尝试使用请求,也尝试使用BeautifulSoup。但似乎Aliexpress在其产品列表中使用了惰性加载程序。我尝试使用窗口滚动脚本,但没有成功。似乎只有我亲自滚动才能加载内容 这是我想在网上搜索的网页的url 这是我目前拥有的代码。它不会在输出中返回任何内容。我想这是因为它试图浏览所有的产品列表,但找不到,因为它没有加载 如果您有

我正在尝试使用Selenium和Python对Aliexpress进行web抓取。我是按照youtube的教程来做的,我遵循了每一个步骤,但我似乎无法让它工作

我尝试使用请求,也尝试使用BeautifulSoup。但似乎Aliexpress在其产品列表中使用了惰性加载程序。我尝试使用窗口滚动脚本,但没有成功。似乎只有我亲自滚动才能加载内容

这是我想在网上搜索的网页的url

这是我目前拥有的代码。它不会在输出中返回任何内容。我想这是因为它试图浏览所有的产品列表,但找不到,因为它没有加载

如果您有任何建议/帮助,我们将不胜感激。对于格式错误和代码错误,我们深表歉意

谢谢大家!

"""
To do
HOT PRODUCT FINDER Enter: Keyword, to generate a url

Product Name
Product Image
Product Link
Sales Number
Price
Create an excel file that contains these data
Sort the list by top selling orders
Develop an algorithm for the velocity of the product (total sales increased / time?)
Scrape site every day """

import csv
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import requests
import lxml

#Starting Up the web driver
driver = webdriver.Chrome()

# grab Keywords
search_term = input('Keywords: ')

# url generator

def get_url(search_term):
    """Generate a url link using search term provided"""
    url_template = 'https://www.aliexpress.com/wholesale?trafficChannel=main&d=y&CatId=0&SearchText={}&ltype=wholesale&SortType=default&g=n'
    search_term = search_term.replace(" ", "+")
    return url_template.format(search_term)

url = get_url('search_term')
driver.get(url)

#scrolling down to the end of the page
time.sleep(2)
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')

#Extracting the Collection
r = requests.get(url)
soup = BeautifulSoup(r.content,'lxml')
productlist = soup.find_all('div', class_='list product-card')
print(productlist)


非常感谢,我试过了密码。它似乎不起作用。我想我刚为我的第一个项目选择了一个非常难的网站。该网站有惰性加载程序。我不知道如何绕过它。我再次更新了代码。请查看它在项目列表中返回的所有元素名称。嗨,感谢您再次尝试。我试过代码,它给了我这个错误。selenium.common.exceptions.StaleElementReferenceException:消息:stale元素引用:元素未附加到页面文档(会话信息:chrome=89.0.4389.114)
import csv
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import requests
import lxml
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('--disable-blink-features=AutomationControlled') 

driver = webdriver.Chrome(executable_path = 'chromedriver.exe',options = chrome_options)
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys

# grab Keywords
search_term = input('Keywords: ')

# url generator
driver.get('https://www.aliexpress.com')
driver.implicitly_wait(10)


p = driver.find_element_by_name('SearchText')
p.send_keys(search_term)
p.send_keys(Keys.ENTER)

productlist = []
product = driver.find_element_by_xpath('//*[@id="root"]/div/div/div[2]/div[2]/div/div[2]/ul')

height = driver.execute_script("return document.body.scrollHeight")
for scrol in range(100,height-1800,100):
    driver.execute_script(f"window.scrollTo(0,{scrol})")
    time.sleep(0.5)
# driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
div = []
list_i = []
item_title = []
a = []
for z in range(1,16):                     
    div.append(product.find_element_by_xpath('//*[@id="root"]/div/div/div[2]/div[2]/div/div[2]/ul/div'+str([z])))
for pr in div:
    list_i.append(pr.find_elements_by_class_name('list-item'))
for pc in list_i:
    for p in pc:
        item_title.append(p.find_element_by_class_name('item-title-wrap'))
for pt in item_title:
    a.append(pt.find_element_by_tag_name('a'))
for prt in a:
    productlist.append(prt.text)