Python 处理环路的连接中断、错误行为
我有以下For循环,它使用Python 处理环路的连接中断、错误行为,python,function,for-loop,while-loop,tweepy,Python,Function,For Loop,While Loop,Tweepy,我有以下For循环,它使用Tweepy为一系列用户获取追随者ID: def download_followers(user, api): all_followers = [] try: for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages(): all_followers.extend(map(str, page)) return all_f
Tweepy
为一系列用户获取追随者ID:
def download_followers(user, api):
all_followers = []
try:
for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
all_followers.extend(map(str, page))
return all_followers
except tweepy.TweepError:
print('Could not access user {}. Skipping...'.format(user))
main_api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
函数的调用方式如下:
for username in lookup_users:
user_followers = download_followers(username, main_api)
if user_followers:
new_followers = pd.DataFrame({
"Handles": username,
"Follower_ID": user_followers,
"Start_Date": today})
new_followers_df = new_followers_df.append(new_followers)
print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))
Rate limit reached. Sleeping for: 895
'Could not access user @barackobama. Trying again...'
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @barackobama at 2017/07/01 10:36:07
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @georgebush at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Finished outputting: @richardnixon at 2017/07/01 10:41:08
根据每个用户
可能拥有的追随者数量,Twitter的API
可能需要调用两到三次才能获取所有用户的追随者
因此,在对api进行另一次调用之前还有15分钟的休息时间。通过将以下参数添加到Tweepy,可以解决此问题:
def download_followers(user, api):
all_followers = []
try:
for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
all_followers.extend(map(str, page))
return all_followers
except tweepy.TweepError:
print('Could not access user {}. Skipping...'.format(user))
main_api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
结果是这样的:
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07
在此情况下,API
两次达到其极限。每次等待15分钟,然后抓住所有@barackobama
的追随者
但是,有时循环的失败。打印信息:
'Could not access user @barackobama. Skipping...'
这主要是由于连接问题、twitter api没有发送正确的请求,或者一个拥有大量追随者的帐户和Tweepy的软件包无法相应地处理它
为了解释可能的连接失败,我尝试用While True
参数包装api,如下所示:
def download_followers(user, api):
all_followers = []
while True:
try:
for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
all_followers.extend(map(str, page))
return all_followers
except tweepy.TweepError:
print('Could not access user {}. Trying Again...'.format(user))
continue
break
但是,通过这种方式包装函数,for循环无法正常工作只对每个用户
迭代一次,而不是抓住其所有追随者,然后转到“查找”用户列表中的下一个用户
例如,改为
,其行为方式如下:
for username in lookup_users:
user_followers = download_followers(username, main_api)
if user_followers:
new_followers = pd.DataFrame({
"Handles": username,
"Follower_ID": user_followers,
"Start_Date": today})
new_followers_df = new_followers_df.append(new_followers)
print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))
Rate limit reached. Sleeping for: 895
'Could not access user @barackobama. Trying again...'
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @barackobama at 2017/07/01 10:36:07
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @georgebush at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Finished outputting: @richardnixon at 2017/07/01 10:41:08
其作用方式如下:
for username in lookup_users:
user_followers = download_followers(username, main_api)
if user_followers:
new_followers = pd.DataFrame({
"Handles": username,
"Follower_ID": user_followers,
"Start_Date": today})
new_followers_df = new_followers_df.append(new_followers)
print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))
Rate limit reached. Sleeping for: 895
'Could not access user @barackobama. Trying again...'
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @barackobama at 2017/07/01 10:36:07
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @georgebush at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Finished outputting: @richardnixon at 2017/07/01 10:41:08
因此,只在每个用户上迭代一次
有什么我做错了吗?返回语句在for
循环中,因此程序在第一次迭代后退出for
循环