Python 为什么我的刮痧总是告诉我“;TCP连接超时”;在Scrapinghub但在我的本地机器上运行良好
我在app.scrapinghub.com中遇到以下错误,但在我的本地机器中工作正常 注意:我使用python(Scrapy框架)中的requests模块发送请求,并使用BeautifulSoup解析响应Python 为什么我的刮痧总是告诉我“;TCP连接超时”;在Scrapinghub但在我的本地机器上运行良好,python,python-3.x,web-scraping,scrapy,python-requests,Python,Python 3.x,Web Scraping,Scrapy,Python Requests,我在app.scrapinghub.com中遇到以下错误,但在我的本地机器中工作正常 注意:我使用python(Scrapy框架)中的requests模块发送请求,并使用BeautifulSoup解析响应 Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks re
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1297, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/usr/local/lib/python3.6/site-packages/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.TCPTimedOutError: TCP connection timed out: 110: Connection timed out.
示例代码:
from scrapy.spider import Spider
import requests,json
from bs4 import BeautifulSoup
from datetime import datetime
from datetime import timedelta
from Scrapy_Project.pipelines import MySQLPipeline
class exampleSpider(Spider):
name='test'
start_urls=['http://www.example.com']
custom_settings = {
'ITEM_PIPELINES': {
'Scrapy_Project.pipelines.MySQLPipeline': None
}
}
def parse(self, response):
current_date=datetime.today()
today=current_date.strftime('%m/%d/%Y')
tommrrow_date= current_date+timedelta(days=1)
tommrrow=tommrrow_date.strftime('%m/%d/%Y')
date_list=[today,tommrrow]
data_lists=[]
for date in date_list:
m_show_time=[]
url='http://www.example.com?id=123'
page = requests.get(url) ///
soup = BeautifulSoup(page.content, 'html.parser')
movie_list=soup.find_all('exampe_info')
for index, item in enumerate(data_lists):
name=item.find('title').get_text()
x_time=item.find('starttime').get_text()
result= {'name': name ,'show_date':date}
yield result
请提供一个@Pitto,我用上面的例子编辑了我的问题……这有帮助吗?您的机器和example.com之间是否存在防火墙?或者是否有
http://www.example.com?id=123
@Serge Ballesta我只是在这个例子中指定。我的代码中有不同的真实url,它在邮递员和本地机器中工作,但问题只存在于app.scrapinghub中。com@Krishnajoshi:我确实认为example.com是一个伪造的URL。我的问题应该是:是否有防火墙或登录页面来访问您的真实URL?