Python BeautifulSoup-刮取多个页面_Python_Web Scraping_Beautifulsoup

Python BeautifulSoup-刮取多个页面

python web-scraping

Python BeautifulSoup-刮取多个页面,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我想从每一页上刮下成员的名字，然后转到下一页，然后做同样的事情。我的代码只适用于一页。我对这个很陌生，任何建议都将不胜感激。多谢各位 import requests from bs4 import BeautifulSoup r = requests.get("https://www.bodia.com/spa-members/page/1") soup = BeautifulSoup(r.text,"html.parser") lights = sou

我想从每一页上刮下成员的名字，然后转到下一页，然后做同样的事情。我的代码只适用于一页。我对这个很陌生，任何建议都将不胜感激。多谢各位

    import requests
    from bs4 import BeautifulSoup

    r = requests.get("https://www.bodia.com/spa-members/page/1")
    soup = BeautifulSoup(r.text,"html.parser")
    lights = soup.findAll("span",{"class":"light"})

    lights_list = []
    for l in lights[0:]:
        result = l.text.strip()
        lights_list.append(result)

    print (lights_list)

我试过了，它只给了我第3页的成员

    for i in range (1,4): #to scrape names of page 1 to 3
    r = requests.get("https://www.bodia.com/spa-members/page/"+ format(i))
soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
    result = l.text.strip()
    lights_list.append(result)

print (lights_list)

然后我试了一下：

i = 1
while i<5:
    r = requests.get("https://www.bodia.com/spa-members/page/"+str(i))
i+=1

soup = BeautifulSoup(r.text,"html.parser")
lights = soup.findAll("span",{"class":"light"})

lights_list = []
for l in lights[0:]:
    result = l.text.strip()
lights_list.append(result)

print (lights_list)

只需要做两个改变，就可以让它刮掉所有东西

r=requests.get（“https://www.bodia.com/spa-members/page/“+格式（i））

需要更改为

r=requests.get（”https://www.bodia.com/spa-members/page/{}.格式（i））

。您使用的格式不正确

您没有循环所有代码，因此结果是它只打印出一组名称，然后无法返回到循环的开头。在for循环下缩进所有内容修复了这一问题

上面的代码每3秒就为它所抓取的页面吐出一个名字列表

有没有办法知道有多少页？在网站上我看到了1,2,3，但当我点击3时，它变为3,4,5。有没有办法看看有多少页，这样我就可以知道塞特的范围了？@taga。这取决于网站的结构。你也可以只刮到一页纸。

['Seng Putheary (Nana)']
['Marco Julia']
['Simon']
['Ms Anne Guerineau']

import requests
from bs4 import BeautifulSoup

for i in range (1,4): #to scrape names of page 1 to 3
    r = requests.get("https://www.bodia.com/spa-members/page/{}".format(i))
    soup = BeautifulSoup(r.text,"html.parser")
    lights = soup.findAll("span",{"class":"light"})
    lights_list = []
    for l in lights[0:]:
        result = l.text.strip()
        lights_list.append(result)

    print(lights_list)