为什么';我的刮网器坏了吗?Python3-请求,BeautifulSoup

为什么';我的刮网器坏了吗?Python3-请求,BeautifulSoup,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我已经关注了一段时间,我制作了一个网络涂鸦器,类似于视频中的那个 语言:Python import requests from bs4 import BeautifulSoup def spider(max_pages): page = 1 while page <= max_pages: url = 'https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=mai

我已经关注了一段时间,我制作了一个网络涂鸦器,类似于视频中的那个

语言:Python

import requests
from bs4 import BeautifulSoup

def spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7&ltype=wholesale&SortType=default&g=n&page=' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text,  'html.parser')
        for link in soup.findAll('a', {'class':'item-title'}):
            href = link.get('href')
            title = link.string
            print(href)
        page += 1

spider(1)
我能做什么


在此之前,我有一个错误,代码是:

soup = BeautifulSoup(plain_text)
我把这个改成了

soup = BeautifulSoup(plain_text,  'html.parser')
错误消失了

我在这里遇到的错误是:

d:/development/Python/TheNewBoston/Python/one/web scrawler.py:10: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 10 of the file d:/development/Python/TheNewBoston/Python/one/web scrawler.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(plain_text)

非常感谢您的帮助,谢谢

没有结果,因为在呈现网页之前,目标类不存在,请求不会出现这种情况

脚本
标记动态检索数据。您可以对保存数据的JavaScript对象进行正则化,并使用json进行解析以获取该信息

显示的错误是由于最初未指定解析器造成的;你已经纠正了

import re, json, requests
import pandas as pd

r = requests.get('https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7&ltype=wholesale&SortType=default&g=n&page=1')
data = json.loads(re.search(r'window\.runParams = (\{".*?\});', r.text, re.S).group(1))
df = pd.DataFrame([(item['title'], 'https:' + item['productDetailUrl']) for item in data['items']])
print(df)

没有结果,因为在呈现网页之前,目标类不存在,请求不会出现这种情况

脚本
标记动态检索数据。您可以对保存数据的JavaScript对象进行正则化,并使用json进行解析以获取该信息

显示的错误是由于最初未指定解析器造成的;你已经纠正了

import re, json, requests
import pandas as pd

r = requests.get('https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7&ltype=wholesale&SortType=default&g=n&page=1')
data = json.loads(re.search(r'window\.runParams = (\{".*?\});', r.text, re.S).group(1))
df = pd.DataFrame([(item['title'], 'https:' + item['productDetailUrl']) for item in data['items']])
print(df)

它需要一个指定的解析器,现在您已经提供了一个。当前的问题是什么?它需要一个指定的解析器,现在您已经提供了一个。目前的问题是什么?