当我抓取Python(博客抓取)时,仅重复1到10页
我想用下面的代码爬网Naver博客,只会爬网第一页上的1到10篇文章。11 ~ 20, 21 ~ 30 .... 如何编辑以继续爬网当我抓取Python(博客抓取)时,仅重复1到10页,python,csv,web-crawler,Python,Csv,Web Crawler,我想用下面的代码爬网Naver博客,只会爬网第一页上的1到10篇文章。11 ~ 20, 21 ~ 30 .... 如何编辑以继续爬网 import sys from bs4 import BeautifulSoup import requests import csv BASE_URL = "https://search.naver.com/search.naver?where=post&sm=tab_pge&query=%ED%99%94%EC%A0%95%EC%B2%9C&
import sys
from bs4 import BeautifulSoup
import requests
import csv
BASE_URL = "https://search.naver.com/search.naver?where=post&sm=tab_pge&query=%ED%99%94%EC%A0%95%EC%B2%9C&st=sim&date_option=8&date_from=20160101&date_to=20161231&dup_remove=1&post_blogurl=&post_blogurl_without=&srchby=all&nso=p%3Afrom20160101to20161231&ie=utf8&start="
f = open("park01.csv", 'w', newline='')
wr =csv.writer(f)
for i in range(100):
URL_with_page_num = BASE_URL + str(1 + i*10)
response = requests.get(BASE_URL)
response.status_code
print (response.status_code)
dom = BeautifulSoup(response.content, "html.parser")
post_elements = dom.select("li.sh_blog_top")
for post_element in post_elements:
title_element = post_element.select_one("a.sh_blog_title")
passage_element = post_element.select_one("dd.sh_blog_passage")
title = title_element.text
url = title_element.get("href")
passage = passage_element.text
data=[title, url, passage]
wr.writerow(data)
f.close()
我猜问题出在下面的代码中-
for i in range(100):
URL_with_page_num = BASE_URL + str(1 + i*10)
response = requests.get(BASE_URL)
在上述代码的最后一行中,将带有页数的URL\u替换为BASE\u URL
response = requests.get(URL_with_page_num)
@윤승용 如果您的问题得到解决,请接受作为答案。