Python 3.x 使用scrapy来刮取食物聚合器（如grubhub）需要它用于某些个人数据科学目的_Python 3.x_Scrapy_Scrapy Splash

Python 3.x 使用scrapy来刮取食物聚合器（如grubhub）需要它用于某些个人数据科学目的

python-3.x scrapy

Python 3.x 使用scrapy来刮取食物聚合器（如grubhub）需要它用于某些个人数据科学目的,python-3.x,scrapy,scrapy-splash,Python 3.x,Scrapy,Scrapy Splash,我试图找出一种方法来找到数据的网站使用飞溅，但没有结果，我需要你的帮助找到一种方法来做到这一点编辑：链接：https://www.grubhub.com/search?location=10001（为简洁起见，编辑了链接）类似地，根据邮政编码和我需要的数据，这是针对不同州的餐厅名称、菜单以及所有可能或可用数据的评级。我尝试对其API进行反向工程，但您可能必须根据您的需要进行调整（并可能根据您的需要进行优化）：我们必须获得一个身份验证载体，才能使用他们的API。要获取令牌，我们首先需要一个

我试图找出一种方法来找到数据的网站使用飞溅，但没有结果，我需要你的帮助找到一种方法来做到这一点

编辑：

链接：https://www.grubhub.com/search?location=10001（为简洁起见，编辑了链接）

类似地，根据邮政编码和我需要的数据，这是针对不同州的餐厅名称、菜单以及所有可能或可用数据的评级。

我尝试对其API进行反向工程，但您可能必须根据您的需要进行调整（并可能根据您的需要进行优化）：

我们必须获得一个

身份验证载体

，才能使用他们的API。要获取令牌，我们首先需要一个

客户端id

：

我正在使用的图书馆获取客户端id 获取身份验证承载提出你的要求下面是与您的请求不同的部分：他们的API使用第三方服务获取邮政编码的经度和纬度（即您的纽约邮政编码为-73.99916077，40.75368499）。甚至可能会有一个选项来更改它：

location=POINT（-73.99916077%2040.75368499）

看起来它也接受其他选项

grub = session.get('https://api-gtm.grubhub.com/restaurants/search/search_listing?orderMethod=delivery&locationMode=DELIVERY&facetSet=umamiV2&pageSize=20&hideHateos=true&searchMetrics=true&location=POINT(-73.99916077%2040.75368499)&facet=promos%3Atrue&facet=open_now%3Atrue&variationId=promosSponsoredRandom&sortSetId=umamiv3&countOmittingTimes=true')

请使用

请求

库来做这件事。请提供网站链接，你想收集什么？为什么要使用scrapy？你为什么不直接使用他们的API？@Gregor我没有找到这个网站的API，其他人也没有提供我所需要的！！：(

session = requests.Session()

static = 'https://www.grubhub.com/eat/static-content-unauth?contentOnly=1'
soup = BeautifulSoup(session.get(static).text, 'html.parser')
client = re.findall("beta_[a-zA-Z0-9]+", soup.find('script', {'type': 'text/javascript'}).text)
# print(client)

# define and add a proper header
headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
        'authorization': 'Bearer',
        'content-type': 'application/json;charset=UTF-8'
          }
session.headers.update(headers)

# straight from networking tools. Device ID appears to accept any 10-digit value
data = '{"brand":"GRUBHUB","client_id":"' + client[0] + '","device_id":1234567890,"scope":"anonymous"}'
resp = session.post('https://api-gtm.grubhub.com/auth', data=data)

# refresh = json.loads(resp.text)['session_handle']['refresh_token']
access = json.loads(resp.text)['session_handle']['access_token']

# update header with new token
session.headers.update({'authorization': 'Bearer ' + access})

grub = session.get('https://api-gtm.grubhub.com/restaurants/search/search_listing?orderMethod=delivery&locationMode=DELIVERY&facetSet=umamiV2&pageSize=20&hideHateos=true&searchMetrics=true&location=POINT(-73.99916077%2040.75368499)&facet=promos%3Atrue&facet=open_now%3Atrue&variationId=promosSponsoredRandom&sortSetId=umamiv3&countOmittingTimes=true')