Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/shell/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Web scraping 从标签中检索所有推文。使用tweepy从速率限制恢复时出现问题_Web Scraping_Twitter_Tweepy_Ratelimit - Fatal编程技术网

Web scraping 从标签中检索所有推文。使用tweepy从速率限制恢复时出现问题

Web scraping 从标签中检索所有推文。使用tweepy从速率限制恢复时出现问题,web-scraping,twitter,tweepy,ratelimit,Web Scraping,Twitter,Tweepy,Ratelimit,我试图抓取一个标签“nationaldoughnutday”的所有推文,但由于速率限制,未能抓取 参考下面的代码,我尝试将代码放入while循环中,这样当速率限制重置时,我可以从最后一个爬网日期(直到_日期)恢复刮取 然而,我不断得到这个错误反复和我的爬虫似乎并没有重新开始爬虫后,睡了很长一段时间 TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was

我试图抓取一个标签“nationaldoughnutday”的所有推文,但由于速率限制,未能抓取

参考下面的代码,我尝试将代码放入while循环中,这样当速率限制重置时,我可以从最后一个爬网日期(直到_日期)恢复刮取

然而,我不断得到这个错误反复和我的爬虫似乎并没有重新开始爬虫后,睡了很长一段时间

TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))
Sleeping...
TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))
Sleeping...
TweepError Failed to send request: ('Connection aborted.', error (10054, 'An existing connection was forcibly closed by the remote host'))
我曾尝试移除内部的try-catch循环,但也没有帮助

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
query = '#nationaldoughnutday'
untill_date = '01-07-2019'

while True:
    try: #outer try catch 
        tweets = tweepy.Cursor(api.search, q=query + '-filter:retweets', rpp=100, lang='en',tweet_mode='extended',until = until_date).items()
        for tweet in tweets:
            try: #inner try catch 
                print "tweet : ", tweet.created_at
                #this is so that if i reconnect with cursor, i will start with the date before the last crawled tweet
                until_date = tweet.created_at.date() - datetime.timedelta(days=1)
                
            except tweepy.TweepError as e:
                print 'Inner TweepyError', e
                time.sleep(17 * 60)
                break
    except tweepy.TweepError as e:
        print 'Inner TweepyError',
        print "sleeping ...."
        time.sleep(17 * 60)
        continue
    except StopIteration:
                break

提前谢谢你

尝试添加此
wait\u on\u rate\u limit=True
这并不能解决问题,因为删除此速率限制与twitter API有关,但仍有助于停止显示错误