Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/327.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 飞溅+;Scrapoxy:x-cache-proxyname标头缺失_Python_Scrapy_Scrapy Splash_Splash Js Render - Fatal编程技术网

Python 飞溅+;Scrapoxy:x-cache-proxyname标头缺失

Python 飞溅+;Scrapoxy:x-cache-proxyname标头缺失,python,scrapy,scrapy-splash,splash-js-render,Python,Scrapy,Scrapy Splash,Splash Js Render,我正在使用以下基础结构来抓取网站: Scrapy <--> Splash <--> Scrapoxy <--> web site 我想检测禁令并删除被禁止的代理。根据: Scrapoxy向响应中添加一个HTTP头x-cache-proxyname 但是我在response.headers中没有看到这个标题。唯一的标题是: {b'Content-Type': b'text/html; charset=utf-8', b'Date': b'Wed, 18 Ap

我正在使用以下基础结构来抓取网站:

Scrapy <--> Splash <--> Scrapoxy <--> web site
我想检测禁令并删除被禁止的代理。根据:

Scrapoxy向响应中添加一个HTTP头
x-cache-proxyname

但是我在
response.headers
中没有看到这个标题。唯一的标题是:

{b'Content-Type': b'text/html; charset=utf-8',
 b'Date': b'Wed, 18 Apr 2018 19:02:21 GMT',
 b'Server': b'TwistedWeb/16.1.1'}
我做错了什么?我应该在Lua脚本中添加一些东西来正确返回头吗


更新:实际上,这似乎不是一个问题。即使通过HTTPie使用,Scrapoxy也不会返回
x-cache-proxyname

http -v --proxy=https:http://<user>:<password>@<scrapoxy-server>:8888 https://<site>

GET / HTTP/1.1
User-Agent: HTTPie/0.9.9
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Host: <site>


HTTP/1.1 200 OK
Server: nginx
Date: Thu, 28 Jun 2018 08:14:26 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Set-Cookie: <...>
X-Powered-By: Express
ETag: W/"5a31b-faPJ7bjKH24S/3EvHU/8IoJHyxw"
Vary: Cookie, User-Agent
Content-Security-Policy: default-src https:; child-src https:; connect-src https: wss:; form-action https:; frame-ancestors https: http://webvisor.com; media-src https:; object-src https:; img-src https: data: blob:; script-src https: data: 'unsafe-inline' 'unsafe-eval'; style-src https: 'unsafe-inline'; font-src https: data:; report-uri /ajax/csp-report/
Content-Encoding: gzip
http-v--proxy=https:http://:@:8888 https://
GET/HTTP/1.1
用户代理:HTTPie/0.9.9
接受编码:gzip,deflate
接受:*/*
连接:保持活力
主持人:
HTTP/1.1200ok
服务器:nginx
日期:2018年6月28日星期四08:14:26 GMT
内容类型:text/html;字符集=utf-8
传输编码:分块
连接:保持活力
改变:接受编码
设置Cookie:
X-Powered-By:Express
ETag:W/“5a31b-faPJ7bjKH24S/3EvHU/8IoJHyxw”
变化:Cookie、用户代理
内容安全策略:默认src https:;子src https:;连接src https:wss:;表格行动https:;框架:https:http://webvisor.com; 媒体src https:;对象src https:;img src https:data:blob:;脚本src https:data:'unsafe inline''unsafe eval';样式src https:“不安全内联”;字体src https:数据:;报告uri/ajax/csp报告/
内容编码:gzip

我用这个lua脚本设法获得了x-cache-proxyname

function main(splash)
 local host = "..."
 local port = "..."
 local username = "..."
 local password = "..."
 local proxy = ""
 splash:on_request(function (request)
    request:set_proxy{host, port, username=username, password=password}
 end) 
 splash:on_response_headers(function(response)
    proxy = response.headers["x-cache-proxyname"]
 end)
 splash.images_enabled = false
 splash:go(splash.args.url)
 splash:set_result_header("x-cache-proxyname", proxy)
 splash:go(splash.args.url)
 return splash:html() 
end
更新: 使用HTTPs时,scrapoxy无法编辑标题并将x-cache-proxyname添加到响应中

function main(splash)
 local host = "..."
 local port = "..."
 local username = "..."
 local password = "..."
 local proxy = ""
 splash:on_request(function (request)
    request:set_proxy{host, port, username=username, password=password}
 end) 
 splash:on_response_headers(function(response)
    proxy = response.headers["x-cache-proxyname"]
 end)
 splash.images_enabled = false
 splash:go(splash.args.url)
 splash:set_result_header("x-cache-proxyname", proxy)
 splash:go(splash.args.url)
 return splash:html() 
end