Python 在scrapy中发布请求时,我们到底应该传递什么作为响应?
在scrapy shell中使用上述代码,我可以登录stackoverflow。但是,我想执行此活动,而不是作为命令行参数。所以,我尝试在子流程中使用上述命令登录Python 在scrapy中发布请求时,我们到底应该传递什么作为响应?,python,scrapy,stackexchange,Python,Scrapy,Stackexchange,在scrapy shell中使用上述代码,我可以登录stackoverflow。但是,我想执行此活动,而不是作为命令行参数。所以,我尝试在子流程中使用上述命令登录 from scrapy import FormRequest url = "https://stackoverflow.com/users/login" fetch(url) req = FormRequest.from_response( response, formid='login-form
from scrapy import FormRequest
url = "https://stackoverflow.com/users/login"
fetch(url)
req = FormRequest.from_response(
response,
formid='login-form',
formdata={'email': 'test@test.com',
'password': 'testpw'},
clickdata={'id': 'submit-button'},
)
fetch(req)
但它给了我这样的错误:
TypeError:类型为“FormRequest”的参数不可iterable
我还尝试将响应保存在html文件中,并将该文件作为响应读取,得到与上面相同的错误消息
import subprocess
import scrapy
from scrapy import FormRequest
from subprocess import run
from bs4 import BeautifulSoup
class QuoteSpider(scrapy.Spider):
name = 'stackover'
start_urls = ['https://stackoverflow.com/users/login']
run(["scrapy","fetch", start_urls[0]], capture_output=True, text=True)
def parse(self, response):
req = FormRequest.from_response(
response,
formid='login-form',
formdata={'email': 'test@test.com',
'password': 'testpw'},
clickdata={'id': 'submit-button'},
)
run(["scrapy","fetch", req], shell=True)
我还试图得到文本响应,再次得到上述错误消息
with open("output.html","w") as f:
response = call(["scrapy","fetch", url], stdout=f, shell=True)
with open("output.html", encoding="utf-8") as f:
data = f.read()
response = BeautifulSoup(data, 'lxml')
在调用解析函数之前,我还尝试了formrequest,如:
r = run(["scrapy","fetch", start_urls[0]], capture_output=True)
response = r.stdout.decode()
我又犯了一个错误
AttributeError:“str”对象没有属性“encoding”
那么,如何使用subprocess运行scrapy shell命令来登录stackoverflow呢。而scrapy中Formrequest中的响应作为输入的具体内容是什么
我正在学习scrapy和各种方法来登录stackoverflow来练习网页抓取
class QuoteSpider(scrapy.Spider):
name = 'stackover'
start_urls = ['https://stackoverflow.com/users/login']
r = run(["scrapy","fetch", start_urls[0]], capture_output=True)
response = r.stdout.decode()
req = FormRequest.from_response(
response,
formid='login-form',
formdata={'email': 'test@test.com',
'password': 'testpw'},
clickdata={'id': 'submit-button'},
)
run(["scrapy","fetch", req], shell=True)
def parse(self, response):
print(response)
您可以使用scrapy crawl stack\u spider
from scrapy import FormRequest
from scrapy import Spider
class StackSpider(Spider):
name = 'stack_spider'
# List of urls for initial requests. Can be one or many.
# Default method parse() is called for start resoponses.
start_urls = ["https://stackoverflow.com/users/login"]
# Parsing users/login page. Getting form and moving on.
def parse(self, response):
yield FormRequest.from_response(
response,
formid='login-form',
formdata={'email': 'test@test.com',
'password': 'testpw'},
clickdata={'id': 'submit-button'},
callback=self.parse_login
)
# Parsing login result
def parse_login(self, response):
print('Checking logging in here.')