Python 为什么有些页面没有爬行?
我正在亚马逊网站上搜索评论 我觉得有些东西让我毛骨悚然 有时,您无法抓取页面。怎么了Python 为什么有些页面没有爬行?,python,python-3.x,web-scraping,web-crawler,Python,Python 3.x,Web Scraping,Web Crawler,我正在亚马逊网站上搜索评论 我觉得有些东西让我毛骨悚然 有时,您无法抓取页面。怎么了 我必须保持长时间的睡眠吗 你不会因为评论中的照片而爬行吗? 很抱歉,我排除了某些部分和爬网。我不知道这个短语。如果你不介意,你能教我吗 还有别的原因吗 import re, requests, csv from bs4 import BeautifulSoup from time import sleep ##CMD : chcp 65001 def reviews_info(div): revi
import re, requests, csv
from bs4 import BeautifulSoup
from time import sleep
##CMD : chcp 65001
def reviews_info(div):
review_text = div.find("div", "a-row review-data").span.text
review_author = div.find("a", "a-size-base a-link-normal author").text
review_stars = div.find("div", "a-row").a.text
on_review_date = div.find('span', 'a-size-base a-color-secondary review-date').text
review_date = [x.strip() for x in re.sub("on ", "", on_review_date).split(",")]
return {
"review_text" : review_text,
"review_author" : review_author,
"review_stars" : review_stars,
"review_date": review_date
}
base_url = "https://www.amazon.com/GRACE-KARIN-BoatNeck-Sleeveless-Vintage/product-reviews/B016XUCYZO/ref=cm_cr_dp_d_show_all_top?ie=UTF8&showViewpoints=1&sortBy=helpful&pageNumber="
reviews = []
NUM_PAGES = 472
for page_num in range(1, NUM_PAGES + 1):
print("souping page", page_num, ",", len(reviews), " data")
url = base_url + str(page_num)
soup = BeautifulSoup(requests.get(url).text, 'lxml')
for div in soup('div', 'a-section review'):
reviews.append(reviews_info(div))
sleep(30)
###################################################
# Save dict data
keys = reviews[0].keys()
with open('amazon_GRACE KARIN BoatNeck Sleeveless Vintage Tea Dress with Belt_review.csv', 'w', encoding="utf-8") as f:
dict_writer = csv.DictWriter(f, delimiter=',', lineterminator='\n', fieldnames=keys)
dict_writer.writeheader()
dict_writer.writerows(reviews)
你试过在
评论中解决缩进问题吗?append
行?@nutmag64我不明白你的意思。在这行对于汤中的div('div','a-section review'):reviews.append(reviews_info(div))sleep(30)
评论。append需要进一步缩进