Python 如何从一个只在滚动后显示响应的网站上一次刮取所有数据?
我试图从一个网站:上刮取学院的名称和地址,但问题是我只获取列表中前11所学院的数据,而没有获取其他学院的数据。 我已经尝试了我所知道的一切,但没有一种方法奏效 我的代码是:Python 如何从一个只在滚动后显示响应的网站上一次刮取所有数据?,python,web-scraping,beautifulsoup,python-requests,python-beautifultable,Python,Web Scraping,Beautifulsoup,Python Requests,Python Beautifultable,我试图从一个网站:上刮取学院的名称和地址,但问题是我只获取列表中前11所学院的数据,而没有获取其他学院的数据。 我已经尝试了我所知道的一切,但没有一种方法奏效 我的代码是: from selenium import webdriver import bs4 from bs4 import BeautifulSoup import requests import pandas as pd from time import sleep driver=webdriver.Chrome('C:/Use
from selenium import webdriver
import bs4
from bs4 import BeautifulSoup
import requests
import pandas as pd
from time import sleep
driver=webdriver.Chrome('C:/Users/acer/Downloads/chromedriver.exe')
driver.get('https://www.collegenp.com/2-science-colleges/')
driver.refresh()
sleep(20)
page=requests.get("https://www.collegenp.com/2-science-colleges/")
college = []
location=[]
soup= BeautifulSoup(page.content,'html.parser')
for a in soup.find_all('div',attrs={'class':'media'}):
name=a.find('h3',attrs={'class':'college-name'})
college.append(name.text)
loc=a.find('span',attrs={'class':'college-address'})
location.append(loc.text)
df=pd.DataFrame({'College name':college,'Locations':location})
df.to_csv('hell.csv',index=False,encoding='utf-8')
有没有办法让我可以刮取所有数据?您可以使用此代码从下一页获取信息:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.collegenp.com/2-science-colleges/"
headers = {"X-Requested-With": "XMLHttpRequest"}
data = {"state": "on", "action": "filter", "count": "0"}
all_data = []
for page in range(0, 5): # <-- increase number of pages here
print("Getting page {}..".format(page))
data["count"] = page * 10
soup = BeautifulSoup(
requests.post(url, data=data, headers=headers).content,
"html.parser",
)
for c in soup.select(".college-name"):
all_data.append(
{
"College name": c.get_text(strip=True),
"Location": c.find_next(class_="college-address").get_text(
strip=True
),
}
)
df = pd.DataFrame(all_data)
print(df)
df.to_csv("data.csv", index=False)
并保存data.csv
(来自LibreOffice的屏幕截图):
您可以使用此代码从下一页获取信息:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.collegenp.com/2-science-colleges/"
headers = {"X-Requested-With": "XMLHttpRequest"}
data = {"state": "on", "action": "filter", "count": "0"}
all_data = []
for page in range(0, 5): # <-- increase number of pages here
print("Getting page {}..".format(page))
data["count"] = page * 10
soup = BeautifulSoup(
requests.post(url, data=data, headers=headers).content,
"html.parser",
)
for c in soup.select(".college-name"):
all_data.append(
{
"College name": c.get_text(strip=True),
"Location": c.find_next(class_="college-address").get_text(
strip=True
),
}
)
df = pd.DataFrame(all_data)
print(df)
df.to_csv("data.csv", index=False)
并保存data.csv
(来自LibreOffice的屏幕截图):
仅供参考,这是刮不刮仅供参考,这是刮不刮