为什么'；我的刮网器坏了吗？Python3-请求，BeautifulSoup_Python_Python 3.x_Web Scraping_Beautifulsoup

为什么'；我的刮网器坏了吗？Python3-请求，BeautifulSoup

python python-3.x web-scraping

为什么'；我的刮网器坏了吗？Python3-请求，BeautifulSoup,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我已经关注了一段时间，我制作了一个网络涂鸦器，类似于视频中的那个语言：Python import requests from bs4 import BeautifulSoup def spider(max_pages): page = 1 while page <= max_pages: url = 'https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=mai

我已经关注了一段时间，我制作了一个网络涂鸦器，类似于视频中的那个

语言：Python

import requests
from bs4 import BeautifulSoup

def spider(max_pages):
    page = 1
    while page <= max_pages:
        url = 'https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7&ltype=wholesale&SortType=default&g=n&page=' + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text,  'html.parser')
        for link in soup.findAll('a', {'class':'item-title'}):
            href = link.get('href')
            title = link.string
            print(href)
        page += 1

spider(1)

我能做什么

在此之前，我有一个错误，代码是：

soup = BeautifulSoup(plain_text)

我把这个改成了

soup = BeautifulSoup(plain_text,  'html.parser')

错误消失了

我在这里遇到的错误是：

d:/development/Python/TheNewBoston/Python/one/web scrawler.py:10: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 10 of the file d:/development/Python/TheNewBoston/Python/one/web scrawler.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(plain_text)

非常感谢您的帮助，谢谢

没有结果，因为在呈现网页之前，目标类不存在，请求不会出现这种情况

从

脚本

标记动态检索数据。您可以对保存数据的JavaScript对象进行正则化，并使用json进行解析以获取该信息

显示的错误是由于最初未指定解析器造成的；你已经纠正了

import re, json, requests
import pandas as pd

r = requests.get('https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7&ltype=wholesale&SortType=default&g=n&page=1')
data = json.loads(re.search(r'window\.runParams = (\{".*?\});', r.text, re.S).group(1))
df = pd.DataFrame([(item['title'], 'https:' + item['productDetailUrl']) for item in data['items']])
print(df)

没有结果，因为在呈现网页之前，目标类不存在，请求不会出现这种情况

从

脚本

标记动态检索数据。您可以对保存数据的JavaScript对象进行正则化，并使用json进行解析以获取该信息

显示的错误是由于最初未指定解析器造成的；你已经纠正了

import re, json, requests
import pandas as pd

r = requests.get('https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7&ltype=wholesale&SortType=default&g=n&page=1')
data = json.loads(re.search(r'window\.runParams = (\{".*?\});', r.text, re.S).group(1))
df = pd.DataFrame([(item['title'], 'https:' + item['productDetailUrl']) for item in data['items']])
print(df)

它需要一个指定的解析器，现在您已经提供了一个。当前的问题是什么？它需要一个指定的解析器，现在您已经提供了一个。目前的问题是什么？