Python 从脚本设置Scrapy start_URL_Python_Python 2.7_Wxpython_Web Scraping_Scrapy

Python 从脚本设置Scrapy start_URL

python python-2.7 wxpython web-scraping scrapy

Python 从脚本设置Scrapy start_URL,python,python-2.7,wxpython,web-scraping,scrapy,Python,Python 2.7,Wxpython,Web Scraping,Scrapy,我有一个正在工作的scrapy spider，我可以在下面的脚本中通过一个单独的脚本来运行它。我还为我的脚本创建了一个wxpythongui，它只包含一个多行TextCtrl，供用户输入一个要刮取的url列表和一个要提交的按钮。当前，开始URL已硬编码到我的spider中-如何将在TextCtrl中输入的URL传递到我的spider中的开始URL数组？提前感谢您的帮助只需在您的Spider实例上设置start\u URL： spider = FollowAllSpider(domain=dom

我有一个正在工作的scrapy spider，我可以在下面的脚本中通过一个单独的脚本来运行它。我还为我的脚本创建了一个wxpythongui，它只包含一个多行TextCtrl，供用户输入一个要刮取的url列表和一个要提交的按钮。当前，开始URL已硬编码到我的spider中-如何将在TextCtrl中输入的URL传递到我的spider中的开始URL数组？提前感谢您的帮助

只需在您的

Spider

实例上设置

start\u URL

：

spider = FollowAllSpider(domain=domain)
spider.start_urls = ['http://google.com']

这个答案对我不适用。我的解决方案适用于Scrapy==1.0.3：

from scrapy.crawler import CrawlerProcess
from tutorial.spiders.some_spider import SomeSpider

process = CrawlerProcess()

process.crawl(SomeSpider, start_urls=["http://www.example.com"])
process.start()

它可能会在将来帮助某些人。

谢谢，这是有效的-但是，只有一个URL。它似乎无法将多行上的多个URL解析为起始URL的正确格式。我在输入单个URL时得到结果，但不是多个URL。有什么建议吗？这是我目前的方法：

spider.start\u URL=[self.tc2.GetValue（）]

如何设置多个URL？什么是

self.tc2

？对不起，

self.tc2

是我的多行文本Ctrl。例如，当我使用

savefile=open（'urls.txt'，'w'）

savefile.write（self.tc2.GetValue（））

时，它会创建一个包含多行的文本文件，就像输入TextCtrl一样。我想我需要知道的是如何将多行解析为数组格式，其中每行用逗号分隔。这有意义吗？@user994585尝试

spider.start\u url=self.tc2.GetValue（）.splitlines（）

。