为什么';我的刮网器坏了吗?Python3-请求,BeautifulSoup
我已经关注了一段时间,我制作了一个网络涂鸦器,类似于视频中的那个 语言:Python为什么';我的刮网器坏了吗?Python3-请求,BeautifulSoup,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我已经关注了一段时间,我制作了一个网络涂鸦器,类似于视频中的那个 语言:Python import requests from bs4 import BeautifulSoup def spider(max_pages): page = 1 while page <= max_pages: url = 'https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=mai
import requests
from bs4 import BeautifulSoup
def spider(max_pages):
page = 1
while page <= max_pages:
url = 'https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7<ype=wholesale&SortType=default&g=n&page=' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, 'html.parser')
for link in soup.findAll('a', {'class':'item-title'}):
href = link.get('href')
title = link.string
print(href)
page += 1
spider(1)
我能做什么
在此之前,我有一个错误,代码是:
soup = BeautifulSoup(plain_text)
我把这个改成了
soup = BeautifulSoup(plain_text, 'html.parser')
错误消失了
我在这里遇到的错误是:
d:/development/Python/TheNewBoston/Python/one/web scrawler.py:10: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 10 of the file d:/development/Python/TheNewBoston/Python/one/web scrawler.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.
soup = BeautifulSoup(plain_text)
非常感谢您的帮助,谢谢 没有结果,因为在呈现网页之前,目标类不存在,请求不会出现这种情况 从
脚本
标记动态检索数据。您可以对保存数据的JavaScript对象进行正则化,并使用json进行解析以获取该信息
显示的错误是由于最初未指定解析器造成的;你已经纠正了
import re, json, requests
import pandas as pd
r = requests.get('https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7<ype=wholesale&SortType=default&g=n&page=1')
data = json.loads(re.search(r'window\.runParams = (\{".*?\});', r.text, re.S).group(1))
df = pd.DataFrame([(item['title'], 'https:' + item['productDetailUrl']) for item in data['items']])
print(df)
没有结果,因为在呈现网页之前,目标类不存在,请求不会出现这种情况 从
脚本
标记动态检索数据。您可以对保存数据的JavaScript对象进行正则化,并使用json进行解析以获取该信息
显示的错误是由于最初未指定解析器造成的;你已经纠正了
import re, json, requests
import pandas as pd
r = requests.get('https://www.aliexpress.com/category/7/computer-office.html?trafficChannel=main&catName=computer-office&CatId=7<ype=wholesale&SortType=default&g=n&page=1')
data = json.loads(re.search(r'window\.runParams = (\{".*?\});', r.text, re.S).group(1))
df = pd.DataFrame([(item['title'], 'https:' + item['productDetailUrl']) for item in data['items']])
print(df)
它需要一个指定的解析器,现在您已经提供了一个。当前的问题是什么?它需要一个指定的解析器,现在您已经提供了一个。目前的问题是什么?