Python 为什么我的刮痧项目停止刮痧，但仍然涂鸦网站好_Python_Web Scraping_Scrapy

Python 为什么我的刮痧项目停止刮痧，但仍然涂鸦网站好

python web-scraping scrapy

Python 为什么我的刮痧项目停止刮痧，但仍然涂鸦网站好,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我正在使用Scrapy从stox.vn收集数据。我有url.txt，有大约800个url，并将所有url传递给我的机器人。然而，一开始它爬行和刮得很好。但那就别再刮了，只能爬了 2013-06-27 03:24:28+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial/PV_Index?filter=1&unit=1000000&ticker=AAA> (referer: h

我正在使用Scrapy从stox.vn收集数据。我有url.txt，有大约800个url，并将所有url传递给我的机器人。然而，一开始它爬行和刮得很好。但那就别再刮了，只能爬了

2013-06-27 03:24:28+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial/PV_Index?filter=1&unit=1000000&ticker=AAA> (referer: http://companyaz.stox.vn/Financial?cId=746&iId=150&iIdL=147&eId=1&tId=2status=1&id=-1&cat=&ticker=AAA)
2013-06-27 03:24:28+0700 [stox] DEBUG: Scraped from <200 http://companyaz.stox.vn/Financial/PV_Index?filter=1&unit=1000000&ticker=AAA>

    {'chi_phi_ban_hang': u'-7453.41',
     'chi_phi_khau_hao_TSCD': u'11890.11',
     'chi_phi_quan_ly': u'-5913.60',
     'chi_phi_tai_chinh': u'-10677.99',
     'chi_phi_tien_lai_vay': u'-5672.17',
     'doanh_thu_thuan': u'122008.75',
     'gia_von_hang_ban': u'-90790.07',
     'lai_co_dong_ct_me': u'11885.60',
     'lai_gop': u'31218.69',
     'lai_sau_thue': u'11885.60',
     'lai_tu_hdkd': u'11376.31',
     'loi_ich_CDTS': u'11885.60',
     'qtime': u'20101',
     'thu_nhap_tai_chinh': u'4202.63',
     'thue_TNDN_hl': u'509.29',
     'thue_TNDN_ht': u'0',
     'ticker': 'AAA'}
.....
2013-06-27 03:24:31+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=446&iId=292&iIdL=280&eId=3&tId=3status=1&id=-1&cat=&ticker=ABI> (referer: None)
2013-06-27 03:24:33+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=1&iId=217&iIdL=202&eId=0&tId=2status=1&id=-1&cat=&ticker=ABT> (referer: None)
2013-06-27 03:24:36+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=164&iId=289&iIdL=279&eId=1&tId=0status=1&id=-1&cat=&ticker=ACB> (referer: None)
2013-06-27 03:24:38+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=522&iId=180&iIdL=170&eId=0&tId=2status=1&id=-1&cat=&ticker=ACC> (referer: None)
2013-06-27 03:24:40+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=486&iId=180&iIdL=170&eId=3&tId=2status=1&id=-1&cat=&ticker=ACE> (referer: None)
2013-06-27 03:24:42+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=2&iId=217&iIdL=202&eId=0&tId=2status=1&id=-1&cat=&ticker=ACL> (referer: None)
2013-06-27 03:24:44+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=858&iId=256&iIdL=241&eId=1&tId=2status=1&id=-1&cat=&ticker=ADC> (referer: None)
2013-06-27 03:24:47+0700 [stox] DEBUG: Crawled (200) <GET http://companyaz.stox.vn/Financial?cId=556&iId=180&iIdL=170&eId=3&tId=2status=1&id=-1&cat=&ticker=ADP> (referer: None)

我的设置.py

BOT_NAME = 'stox'

SPIDER_MODULES = ['stox.spiders']
NEWSPIDER_MODULE = 'stox.spiders'

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'stox (+http://www.yourdomain.com)'
#ITEM_PIPELINES = ['stox.pipelines.StoxPipeline']
DOWNLOAD_DELAY = 2
#DOWNLOAD_TIMEOUT = 180
#CONCURRENT_REQUESTS = 2

我检查当我更改param CONCURRENT_请求时，它是否会在刮取CONCURRENT_请求次数后停止，然后它只会爬行。我认为并发进程有一个问题（它不是免费的进程？？）

已更新 url.txt的内容

http://companyaz.stox.vn/Financial?cId=746&iId=150&iIdL=147&eId=1&tId=2status=1&id=-1&cat=&ticker=AAA
http://companyaz.stox.vn/Financial?cId=446&iId=292&iIdL=280&eId=3&tId=3status=1&id=-1&cat=&ticker=ABI
http://companyaz.stox.vn/Financial?cId=1&iId=217&iIdL=202&eId=0&tId=2status=1&id=-1&cat=&ticker=ABT
.....

非常感谢您的帮助！多谢各位

PS：我对Scrapy项目很陌生，很抱歉我的英文不好。在你的800 url中，你写的文件是股票代码名。所有URL中的股票代码名称是否不同？如果它们不明显，您可能正在覆盖这些文件。您可以使用导出选项，而不是写入文件

您可以阅读以下线程以了解有关导出数据的信息。

URL.txt的内容是什么？@alecxe:很抱歉回答晚了。我更新的url已经在上面了。它的内容是800个url的列表。

http://companyaz.stox.vn/Financial?cId=746&iId=150&iIdL=147&eId=1&tId=2status=1&id=-1&cat=&ticker=AAA
http://companyaz.stox.vn/Financial?cId=446&iId=292&iIdL=280&eId=3&tId=3status=1&id=-1&cat=&ticker=ABI
http://companyaz.stox.vn/Financial?cId=1&iId=217&iIdL=202&eId=0&tId=2status=1&id=-1&cat=&ticker=ABT
.....