Python 如何限制for循环中填充数据帧的行数
我已经编写了以下函数,可以从一个网站上刮取多个页面。我只想拿到前20页左右。如何限制在数据框中填充的行数:Python 如何限制for循环中填充数据帧的行数,python,dataframe,web-scraping,Python,Dataframe,Web Scraping,我已经编写了以下函数,可以从一个网站上刮取多个页面。我只想拿到前20页左右。如何限制在数据框中填充的行数: def scrape_page(poi,page_name): base_url="https://www.fake_website.org/" report_url=(base_url+poi) page=urlopen(report_url) experiences=BeautifulSoup(page,"html.parser") empty_
def scrape_page(poi,page_name):
base_url="https://www.fake_website.org/"
report_url=(base_url+poi)
page=urlopen(report_url)
experiences=BeautifulSoup(page,"html.parser")
empty_list=[]
for link in experiences.findAll('a', attrs={'href': re.compile(page_name+".shtml$")}):
url=urljoin(base_url, link.get("href"))
subpage=urlopen(url)
expages=BeautifulSoup(subpage, "html.parser")
for report in expages.findAll('a', attrs={'href': re.compile("^/experiences/exp")}):
url=urljoin(base_url, report.get("href"))
reporturlopen=urlopen(url)
reporturl=BeautifulSoup(reporturlopen, "html.parser")
book_title= reporturl.findAll("div",attrs={'class':'title'})
for i in book_title:
title=i.get_text()
book_genre= reporturl.findAll("div",attrs={'class':'genre'})
for i in book_genre:
genre=i.get_text()
book_author= reporturl.findAll("div",attrs={'class':'author'})
for i in book_author:
author=i.get_text()
author = re.sub("by", "",author)
empty_list.append({'title':title,'genre':genre,'author':author})
setattr(sys.modules[__name__], '{}_df'.format(poi+"_"+page_name), empty_list)
例如,可以添加while循环:
i = 0
while i < 20:
< insert your code >
i += 1
i=0
而我<20:
<插入您的代码>
i+=1
例如,您可以添加while循环:
i = 0
while i < 20:
< insert your code >
i += 1
i=0
而我<20:
<插入您的代码>
i+=1
向循环添加计数器?向循环添加计数器?