Python 如何使用scrapy在请求中指定参数
如何将参数传递给url上的请求,如下所示:Python 如何使用scrapy在请求中指定参数,python,web-crawler,scrapy,scrapy-spider,Python,Web Crawler,Scrapy,Scrapy Spider,如何将参数传递给url上的请求,如下所示: site.com/search/?action=search&description=My Search here&e_author= 如何将参数放在爬行器请求的结构上,如以下示例: req = Request(url="site.com/",parameters={x=1,y=2,z=3}) 在URL本身内传递GET参数: return Request(url="https://yoursite.com/search/?actio
site.com/search/?action=search&description=My Search here&e_author=
如何将参数放在爬行器请求的结构上,如以下示例:
req = Request(url="site.com/",parameters={x=1,y=2,z=3})
在URL本身内传递GET参数:
return Request(url="https://yoursite.com/search/?action=search&description=MySearchhere&e_author=")
您可能应该在字典中定义参数,然后:
Scrapy不直接提供这种服务。您要做的是创建一个url,您可以使用模块来创建该url,您必须使用您拥有的任何参数自己创建url Python 3或更高版本
import urllib
params = {
'key': self.access_key,
'part': 'snippet,replies',
'videoId': self.video_id,
'maxResults': 100
}
url = f'https://www.googleapis.com/youtube/v3/commentThreads/?{urllib.parse.urlencode(params)}'
request = scrapy.Request(url, callback=self.parse)
yield request
Python 3+示例在这里,我尝试使用官方的youtube api获取一些youtube视频的所有评论。评论将以分页的形式出现。看看我是如何从params构造url来调用它的
import scrapy
import urllib
import json
import datetime
from youtube_scrapy.items import YoutubeItem
class YoutubeSpider(scrapy.Spider):
name = 'youtube'
BASE_URL = 'https://www.googleapis.com/youtube/v3'
def __init__(self):
self.access_key = 'you_yuotube_api_access_key'
self.video_id = 'any_youtube_video_id'
def start_requests(self):
params = {
'key': self.access_key,
'part': 'snippet,replies',
'videoId': self.video_id,
'maxResults': 100
}
url = f'{self.BASE_URL}/commentThreads/?{urllib.parse.urlencode(params)}'
request = scrapy.Request(url, callback=self.parse)
request.meta['params'] = params
return [request]
def parse(self, response):
data = json.loads(response.body)
# lets collect comment and reply
items = data.get('items', [])
for item in items:
created_date = item['snippet']['topLevelComment']['snippet']['publishedAt']
_created_date = datetime.datetime.strptime(created_date, '%Y-%m-%dT%H:%M:%S.000Z')
id = item['snippet']['topLevelComment']['id']
record = {
'created_date': _created_date,
'body': item['snippet']['topLevelComment']['snippet']['textOriginal'],
'creator_name': item['snippet']['topLevelComment']['snippet'].get('authorDisplayName', {}),
'id': id,
'url': f'https://www.youtube.com/watch?v={self.video_id}&lc={id}',
}
yield YoutubeItem(**record)
# lets paginate if next page is available for more comments
next_page_token = data.get('nextPageToken', None)
if next_page_token:
params = response.meta['params']
params['pageToken'] = next_page_token
url = f'{self.BASE_URL}/commentThreads/?{urllib.parse.urlencode(params)}'
request = scrapy.Request(url, callback=self.parse)
request.meta['params'] = params
yield request
可以使用w3lib中的添加或替换参数
from w3lib.url import add_or_replace_parameters
def abc(self, response):
url = "https://yoursite.com/search/" # can be response.url or any
params = {
"action": "search",
"description": "My search here",
"e_author": ""
}
return Request(url=add_or_replace_parameters(url, prams))
要使用scrapy创建带参数的GET请求,可以使用以下示例:
yield scrapy.FormRequest(
url=url,
method='GET',
formdata=params,
callback=self.parse_result
)
其中“params”是一个带有您的参数的dict。很抱歉,我不知道如何将其放在scrapy结构上太好了!它甚至支持多次设置变量,使用
{'a':['val1','val2']}
或者,我认为,('a','val1'),('a','val2')
。你可能想看看
yield scrapy.FormRequest(
url=url,
method='GET',
formdata=params,
callback=self.parse_result
)