Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/joomla/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么我的刮痧项目停止刮痧,但仍然涂鸦网站好_Python_Web Scraping_Scrapy - Fatal编程技术网

Python 为什么我的刮痧项目停止刮痧,但仍然涂鸦网站好

Python 为什么我的刮痧项目停止刮痧,但仍然涂鸦网站好,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我正在使用Scrapy从stox.vn收集数据。我有url.txt,有大约800个url,并将所有url传递给我的机器人。然而,一开始它爬行和刮得很好。但那就别再刮了,只能爬了 2013-06-27 03:24:28+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial/PV_Index?filter=1&unit=1000000&ticker=AAA> (referer: h

我正在使用Scrapy从stox.vn收集数据。我有url.txt,有大约800个url,并将所有url传递给我的机器人。然而,一开始它爬行和刮得很好。但那就别再刮了,只能爬了

2013-06-27 03:24:28+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial/PV_Index?filter=1&unit=1000000&ticker=AAA> (referer: http://companyaz.stox.vn/Financial?cId=746&iId=150&iIdL=147&eId=1&tId=2status=1&id=-1&cat=&ticker=AAA)
2013-06-27 03:24:28+0700 [stox] DEBUG: Scraped from <200 http://companyaz.stox.vn/Financial/PV_Index?filter=1&unit=1000000&ticker=AAA>

    {'chi_phi_ban_hang': u'-7453.41',
     'chi_phi_khau_hao_TSCD': u'11890.11',
     'chi_phi_quan_ly': u'-5913.60',
     'chi_phi_tai_chinh': u'-10677.99',
     'chi_phi_tien_lai_vay': u'-5672.17',
     'doanh_thu_thuan': u'122008.75',
     'gia_von_hang_ban': u'-90790.07',
     'lai_co_dong_ct_me': u'11885.60',
     'lai_gop': u'31218.69',
     'lai_sau_thue': u'11885.60',
     'lai_tu_hdkd': u'11376.31',
     'loi_ich_CDTS': u'11885.60',
     'qtime': u'20101',
     'thu_nhap_tai_chinh': u'4202.63',
     'thue_TNDN_hl': u'509.29',
     'thue_TNDN_ht': u'0',
     'ticker': 'AAA'}
.....
2013-06-27 03:24:31+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=446&iId=292&iIdL=280&eId=3&tId=3status=1&id=-1&cat=&ticker=ABI> (referer: None)
2013-06-27 03:24:33+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=1&iId=217&iIdL=202&eId=0&tId=2status=1&id=-1&cat=&ticker=ABT> (referer: None)
2013-06-27 03:24:36+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=164&iId=289&iIdL=279&eId=1&tId=0status=1&id=-1&cat=&ticker=ACB> (referer: None)
2013-06-27 03:24:38+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=522&iId=180&iIdL=170&eId=0&tId=2status=1&id=-1&cat=&ticker=ACC> (referer: None)
2013-06-27 03:24:40+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=486&iId=180&iIdL=170&eId=3&tId=2status=1&id=-1&cat=&ticker=ACE> (referer: None)
2013-06-27 03:24:42+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=2&iId=217&iIdL=202&eId=0&tId=2status=1&id=-1&cat=&ticker=ACL> (referer: None)
2013-06-27 03:24:44+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=858&iId=256&iIdL=241&eId=1&tId=2status=1&id=-1&cat=&ticker=ADC> (referer: None)
2013-06-27 03:24:47+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=556&iId=180&iIdL=170&eId=3&tId=2status=1&id=-1&cat=&ticker=ADP> (referer: None)
我的设置.py

BOT_NAME = 'stox'

SPIDER_MODULES = ['stox.spiders']
NEWSPIDER_MODULE = 'stox.spiders'

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'stox (+http://www.yourdomain.com)'
#ITEM_PIPELINES = ['stox.pipelines.StoxPipeline']
DOWNLOAD_DELAY = 2
#DOWNLOAD_TIMEOUT = 180
#CONCURRENT_REQUESTS = 2
我检查当我更改param CONCURRENT_请求时,它是否会在刮取CONCURRENT_请求次数后停止,然后它只会爬行。我认为并发进程有一个问题(它不是免费的进程??)

已更新 url.txt的内容

http://companyaz.stox.vn/Financial?cId=746&iId=150&iIdL=147&eId=1&tId=2status=1&id=-1&cat=&ticker=AAA
http://companyaz.stox.vn/Financial?cId=446&iId=292&iIdL=280&eId=3&tId=3status=1&id=-1&cat=&ticker=ABI
http://companyaz.stox.vn/Financial?cId=1&iId=217&iIdL=202&eId=0&tId=2status=1&id=-1&cat=&ticker=ABT
.....
非常感谢您的帮助! 多谢各位


PS:我对Scrapy项目很陌生,很抱歉我的英文不好。在你的800 url中,你写的文件是股票代码名。 所有URL中的股票代码名称是否不同?如果它们不明显,您可能正在覆盖这些文件。您可以使用导出选项,而不是写入文件

您可以阅读以下线程以了解有关导出数据的信息。

URL.txt的内容是什么?@alecxe:很抱歉回答晚了。我更新的url已经在上面了。它的内容是800个url的列表。
http://companyaz.stox.vn/Financial?cId=746&iId=150&iIdL=147&eId=1&tId=2status=1&id=-1&cat=&ticker=AAA
http://companyaz.stox.vn/Financial?cId=446&iId=292&iIdL=280&eId=3&tId=3status=1&id=-1&cat=&ticker=ABI
http://companyaz.stox.vn/Financial?cId=1&iId=217&iIdL=202&eId=0&tId=2status=1&id=-1&cat=&ticker=ABT
.....