Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/340.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从一个只在滚动后显示响应的网站上一次刮取所有数据?_Python_Web Scraping_Beautifulsoup_Python Requests_Python Beautifultable - Fatal编程技术网

Python 如何从一个只在滚动后显示响应的网站上一次刮取所有数据?

Python 如何从一个只在滚动后显示响应的网站上一次刮取所有数据?,python,web-scraping,beautifulsoup,python-requests,python-beautifultable,Python,Web Scraping,Beautifulsoup,Python Requests,Python Beautifultable,我试图从一个网站:上刮取学院的名称和地址,但问题是我只获取列表中前11所学院的数据,而没有获取其他学院的数据。 我已经尝试了我所知道的一切,但没有一种方法奏效 我的代码是: from selenium import webdriver import bs4 from bs4 import BeautifulSoup import requests import pandas as pd from time import sleep driver=webdriver.Chrome('C:/Use

我试图从一个网站:上刮取学院的名称和地址,但问题是我只获取列表中前11所学院的数据,而没有获取其他学院的数据。 我已经尝试了我所知道的一切,但没有一种方法奏效

我的代码是:

from selenium import webdriver
import bs4
from bs4 import BeautifulSoup
import requests
import pandas as pd
from time import sleep

driver=webdriver.Chrome('C:/Users/acer/Downloads/chromedriver.exe')
driver.get('https://www.collegenp.com/2-science-colleges/')

driver.refresh()
sleep(20)

page=requests.get("https://www.collegenp.com/2-science-colleges/")

college = []
location=[]

soup= BeautifulSoup(page.content,'html.parser')

for a in soup.find_all('div',attrs={'class':'media'}):
  name=a.find('h3',attrs={'class':'college-name'})
  college.append(name.text)
  loc=a.find('span',attrs={'class':'college-address'})
  location.append(loc.text)

df=pd.DataFrame({'College name':college,'Locations':location})
df.to_csv('hell.csv',index=False,encoding='utf-8')

有没有办法让我可以刮取所有数据?

您可以使用此代码从下一页获取信息:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.collegenp.com/2-science-colleges/"

headers = {"X-Requested-With": "XMLHttpRequest"}
data = {"state": "on", "action": "filter", "count": "0"}

all_data = []
for page in range(0, 5):  # <-- increase number of pages here
    print("Getting page {}..".format(page))

    data["count"] = page * 10
    soup = BeautifulSoup(
        requests.post(url, data=data, headers=headers).content,
        "html.parser",
    )

    for c in soup.select(".college-name"):
        all_data.append(
            {
                "College name": c.get_text(strip=True),
                "Location": c.find_next(class_="college-address").get_text(
                    strip=True
                ),
            }
        )

df = pd.DataFrame(all_data)
print(df)
df.to_csv("data.csv", index=False)
并保存
data.csv
(来自LibreOffice的屏幕截图):


您可以使用此代码从下一页获取信息:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://www.collegenp.com/2-science-colleges/"

headers = {"X-Requested-With": "XMLHttpRequest"}
data = {"state": "on", "action": "filter", "count": "0"}

all_data = []
for page in range(0, 5):  # <-- increase number of pages here
    print("Getting page {}..".format(page))

    data["count"] = page * 10
    soup = BeautifulSoup(
        requests.post(url, data=data, headers=headers).content,
        "html.parser",
    )

    for c in soup.select(".college-name"):
        all_data.append(
            {
                "College name": c.get_text(strip=True),
                "Location": c.find_next(class_="college-address").get_text(
                    strip=True
                ),
            }
        )

df = pd.DataFrame(all_data)
print(df)
df.to_csv("data.csv", index=False)
并保存
data.csv
(来自LibreOffice的屏幕截图):

仅供参考,这是刮不刮仅供参考,这是刮不刮