Python 如何在文本列表中使用scrapy

Python 如何在文本列表中使用scrapy,python,web-scraping,scrapy,python-requests,Python,Web Scraping,Scrapy,Python Requests,大家好,我在一个新的项目上工作过,这个项目是用Scrasty将ip转换成域名 我找不到如何在我的起始url上添加列表文本(ip.txt),请将(+ip)替换为按文本列表 埃克斯梅尔: start_urls = [ `"https://api.hackertarget.com/reverseiplookup/?q= + ip"`] -----------------------------------我的代码----------------------------- # -*- codi

大家好,我在一个新的项目上工作过,这个项目是用Scrasty将ip转换成域名

我找不到如何在我的起始url上添加列表文本(ip.txt),请将(+ip)替换为按文本列表

埃克斯梅尔:

start_urls = [
    `"https://api.hackertarget.com/reverseiplookup/?q= + ip"`]
-----------------------------------我的代码-----------------------------

# -*- coding: utf-8 -*-
import scrapy

lists = open(raw_input('IP list file name: '), 'r').read().split('\n')

class jeffbullasSpider(scrapy.Spider):
    name = "iptohost"
    allowed_domains = ["api.hackertarget.com"]
    start_urls = [
    "https://api.hackertarget.com/reverseiplookup/?q=" + str(lists) ] 

    def parse(self, response):
       print response.xpath('//body//text()').get()
(我是python新手,非常感谢你。)

试试这个:

编辑:在发送请求之前,还要剥离ip

import scrapy

lists = open(raw_input('IP list file name: '), 'r').read().split('\n')

class jeffbullasSpider(scrapy.Spider):
    name = "iptohost"
    allowed_domains = ["api.hackertarget.com"]
    url = "https://api.hackertarget.com/reverseiplookup/?q={}"

    def start_requests(self):
        for ip in lists:
            yield scrapy.Request(url=self.url.format(ip.strip()), callback=self.parse)

    def parse(self, response):
       print(response.xpath('//body//text()').get())

我还有其他问题,我在scrapy上添加了代理旋转,在使用此命令保存后:

scrapy crawl iptohost -o some.json -t json &> some.text
我的结果不仅包含我的域包含代理结果和我的结果域

我的成绩

    2019-11-10 10:39:50 [rotating_proxies.expire] DEBUG: Proxy <http://197.157.219.25:8080> is DEAD
2019-11-10 10:39:50 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://api.hackertarget.com/reverseiplookup/?q=61.112.2.178> with another proxy (failed 4 times, max retries: 5)
2019-11-10 10:39:50 [rotating_proxies.expire] DEBUG: Proxy <http://139.59.99.119:8080> is DEAD
2019-11-10 10:39:50 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://api.hackertarget.com/reverseiplookup/?q=195.11.184.130> with another proxy (failed 5 times, max retries: 5)
2019-11-10 10:39:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://api.hackertarget.com/reverseiplookup/?q=195.11.184.130> (referer: None)
2019-11-10 10:39:50 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://api.hackertarget.com/reverseiplookup/?q=185.179.235.40> (referer: None)
[u'capitalinstant.com']
[u'142.92.242.192']
[u'API count exceeded - Increase Quota with Membership']
[u'API count exceeded - Increase Quota with Membership']
[u'API count exceeded - Increase Quota with Membership']
2019-11-10 10:39:50[旋转代理。过期]调试:代理已死亡
2019-11-10 10:39:50[旋转代理.中间件]调试:使用另一个代理重试(失败4次,最多重试5次)
2019-11-10 10:39:50[旋转代理。过期]调试:代理已失效
2019-11-10 10:39:50[旋转代理.中间件]调试:使用另一个代理重试(失败5次,最多重试5次)
2019-11-10 10:39:50[刮屑核心引擎]调试:爬网(200)(参考:无)
2019-11-10 10:39:50[刮屑核心引擎]调试:爬网(200)(参考:无)
[u'capitalistant.com']
[u'142.92.242.192']
[超过u'API计数-增加会员资格的配额']
[超过u'API计数-增加会员资格的配额']
[超过u'API计数-增加会员资格的配额']

如何删除仅刮取我的域结果的代理结果请非常感谢你mutch=)

#--编码:utf-8--导入刮除列表=打开(原始输入('IP列表文件名:'),'r').read().split('\n')类jeffbullasSpider(scrapy.Spider):name=“iptohost”允许的域=[“api.hackertarget.com”]启动URL=[“”+str(列表)]def parse(self,response):print response.xpath('//body//text()).get()不客气!如果答案是您想要的,请批准:)您可以查看Scrapys教程