Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/google-sheets/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Web scraping IMPORTXML在从网站中抓取数据时显示错误_Web Scraping_Google Sheets_Google Sheets Formula - Fatal编程技术网

Web scraping IMPORTXML在从网站中抓取数据时显示错误

Web scraping IMPORTXML在从网站中抓取数据时显示错误,web-scraping,google-sheets,google-sheets-formula,Web Scraping,Google Sheets,Google Sheets Formula,我正试图从这个网站上搜刮100所大学的名单 使用=IMPORTXML(“https://www.topuniversities.com/university-rankings/usa-rankings/2021“,”/*[@id='ranking-data-load']/div[1]/div/div/div/div/div[2]” 显示错误:导入的内容为空 如何使用xpath获取所需数据?我在开发人员工具中找到了这个xhr请求 https://www.topuniversities.com/si

我正试图从这个网站上搜刮100所大学的名单

使用
=IMPORTXML(“https://www.topuniversities.com/university-rankings/usa-rankings/2021“,”/*[@id='ranking-data-load']/div[1]/div/div/div/div/div[2]”

显示错误:
导入的内容为空


如何使用xpath获取所需数据?

我在开发人员工具中找到了这个xhr请求

https://www.topuniversities.com/sites/default/files/qs-rankings-data/en/3738856.txt?1622189434?v=1622361479157

除非呈现JavaScript,否则xpath将无法工作

为了做到这一点,你有两个选择

  • selenium/webbrowser(需要webdriver)chrome或Firefox等

  • 收集适当的标题和数据,以便通过请求模块发送请求

代码呢

import requests

URL = 'https://www.topuniversities.com/sites/default/files/qs-rankings-data/en/3738856.txt?1622189434?v=1622361479157'


headers = {
   "Host": "www.topuniversities.com",
   "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux armv8l; rv:88.0) Gecko/20100101 Firefox/88.0",
   "Accept": "application/json, text/javascript, */*; q=0.01",
   "Accept-Language": "en-US,en;q=0.5",
   "Accept-Encoding": "gzip, deflate",
   "Referer": "https://www.topuniversities.com/university-rankings/usa-rankings/2021",
   "X-Requested-With": "XMLHttpRequest",
   "via": "1.1 google"
}

datas = requests.get(URL, headers=headers).json()
import re

for i in datas['data']:
    for j in re.findall('class="uni-link">(.*)</a>',i['title']):
        print(j)

@rene你能告诉我你是如何找到开发者工具和这个url的:@vish我没有写答案,我只是编辑了一下,让它变得清晰一点。我不知道这个用户是如何得到他们的答案的。@Sheshanandh您能告诉我您是如何找到XHR请求开发工具和这个url的吗
Harvard University
Stanford University
Massachusetts Institute of Technology (MIT)
University of California, Berkeley (UCB)
University of California, Los Angeles (UCLA)
Yale University