使用Python抓取Aliexpress-需要登录_Python_Web Scraping_Beautifulsoup_Web Crawler

使用Python抓取Aliexpress-需要登录

python web-scraping web-crawler

使用Python抓取Aliexpress-需要登录,python,web-scraping,beautifulsoup,web-crawler,Python,Web Scraping,Beautifulsoup,Web Crawler,我正在尝试用Python编写一个web scraper，它将遍历Aliexpress供应商的所有产品。我的问题是，当我没有登录它时，我最终被重定向到登录网页。我在代码中添加了登录部分，但没有帮助。我将感谢所有建议我的代码： import requests from bs4 import BeautifulSoup import re import sys from lxml import html def go_through_paginator(link): source_co

我正在尝试用Python编写一个web scraper，它将遍历Aliexpress供应商的所有产品。我的问题是，当我没有登录它时，我最终被重定向到登录网页。我在代码中添加了登录部分，但没有帮助。我将感谢所有建议

我的代码：

import requests
from bs4 import BeautifulSoup
import re
import sys
from lxml import html


def go_through_paginator(link):
    source_code = requests.get(link, data=payload,  headers = dict(referer = link))
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    print(soup)
    for page in soup.findAll ('div', {'class' : 'ui-pagination-navi util-left'}):
        for next_page in page.findAll ('a', {'class' : 'ui-pagination-next'}):
            next_page_link="https:" + next_page.get('href')
            print (next_page_link)
            gather_all_products (next_page_link)

def gather_all_products (url):
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for item in soup.findAll ('a', {'class' : 'pic-rind'}):
        product_link=item.get('href')
    go_through_paginator(url)


payload = {
    "loginId": "EMAIL", 
    "password": "LOGIN",
}

LOGIN_URL='https://login.aliexpress.com/buyer.htm?spm=2114.12010608.1000002.4.EihgQ5&return=https%3A%2F%2Fwww.aliexpress.com%2Fstore%2F1816376%3Fspm%3D2114.10010108.0.0.fs2frD&random=CAB39130D12E432D4F5D75ED04DC0A84'

session_requests = requests.session()
source_code = session_requests.get(LOGIN_URL)
source_code = session_requests.post(LOGIN_URL, data = payload)


URL='https://www.aliexpress.com/store/1816376?spm=2114.10010108.0.0.fs2frD'

source_code = requests.get(URL, data=payload,  headers = dict(referer = URL))
plain_text = source_code.text
soup = BeautifulSoup(plain_text)

for L1 in soup.findAll ('li', {'id' : 'product-nav'}):
    for L1_link in L1.findAll('a', {'class' : 'nav-link'}):
        link = "https:" + L1_link.get('href') 
        gather_all_products(link)

这是aliexpress登录URL：

尝试从响应cookies中的xman\u t和intl\u common\u ever设置cookies值

我尝试直接获取所有产品信息。在我设置xman_t和intl_common_forever Aliexpress之前，请允许我获取7种产品。在我设置xman_t和intl_common_forever之后，我成功地获得了50种产品

希望这能帮助您刮取他们的产品。

尝试从响应cookies中的xman\u t和intl\u common\u ever设置cookies值

希望这能帮你刮去他们的产品。

你对他们寄回来的饼干做了什么吗？因为他们很可能正在验证。你的cookie可能需要在标题中，但看起来你的标题只是URL？我可能会用类似这样的东西区分登录和注销的标题，然后根据他们的需要进行设置。你在处理他们寄回来的饼干吗？因为他们很可能正在验证。你的cookie可能需要在标题中，但看起来你的标题只是URL？我可能会用类似这样的东西区分登录和注销的标题，然后根据他们的需要进行设置。