Lambda Python Pool.map和urllib2.urlopen:只重试失败的进程,只记录错误

Lambda Python Pool.map和urllib2.urlopen:只重试失败的进程,只记录错误,python,lambda,urllib2,Python,Lambda,Urllib2,我有一个AWS Lambda函数,它使用pool.map调用一组URL。问题是,如果其中一个URL返回的不是200,Lambda函数将失败并立即重试。问题是它会立即重试整个lambda函数。我希望它只重试失败的URL,如果(第二次尝试后)仍然失败,请调用固定URL记录错误 这是它当前所在的代码(删除了一些详细信息),仅当所有URL为: from __future__ import print_function import urllib2 from multiprocessing.dummy

我有一个AWS Lambda函数,它使用
pool.map
调用一组URL。问题是,如果其中一个URL返回的不是
200
,Lambda函数将失败并立即重试。问题是它会立即重试整个lambda函数。我希望它只重试失败的URL,如果(第二次尝试后)仍然失败,请调用固定URL记录错误

这是它当前所在的代码(删除了一些详细信息),仅当所有URL为:

from __future__ import print_function
import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

import hashlib
import datetime
import json

print('Loading function')

def lambda_handler(event, context):

  f = urllib2.urlopen("https://example.com/geturls/?action=something");
  data = json.loads(f.read());

  urls = [];
  for d in data:
      urls.append("https://"+d+".example.com/path/to/action");

  # Make the Pool of workers
  pool = ThreadPool(4);

  # Open the urls in their own threads
  # and return the results
  results = pool.map(urllib2.urlopen, urls);

  #close the pool and wait for the work to finish 
  pool.close();
  return pool.join();
我试着阅读,但它似乎在解释
map
函数方面有点欠缺,特别是解释返回值

使用文档,我尝试将代码修改为以下内容:

from __future__ import print_function
import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

import hashlib
import datetime
import json

print('Loading function')

def lambda_handler(event, context):

  f = urllib2.urlopen("https://example.com/geturls/?action=something");
  data = json.loads(f.read());

  urls = [];
  for d in data:
      urls.append("https://"+d+".example.com/path/to/action");

  # Make the Pool of workers
  pool = ThreadPool(4);

  # Open the urls in their own threads
  # and return the results
  try:
     results = pool.map(urllib2.urlopen, urls);
  except URLError:
     try:                              # try once more before logging error
        urllib2.urlopen(URLError.url); # TODO: figure out which URL errored
     except URLError:                  # log error
        urllib2.urlopen("https://example.com/error/?url="+URLError.url);

  #close the pool and wait for the work to finish 
  pool.close();
  return true; # always return true so we never duplicate successful calls

我不确定这样做是否正确,或者我是否正确地使用python异常表示法。同样,我的目标是我希望它只重试失败的URL,如果(第二次尝试后)仍然失败,请调用一个固定的URL来记录错误。

多亏了

答案是为
urllib2.urlopen
函数创建我自己的自定义包装,因为每个线程本身都需要尝试{}捕获,而不是整个过程。该函数看起来是这样的:

def my_urlopen(url):
    try:
        return urllib2.urlopen(url)
    except URLError:
        urllib2.urlopen("https://example.com/log_error/?url="+url)
        return None
我将其放在
def lambda_处理程序
函数声明之上,然后我可以从以下内容替换其中的整个try/catch:

try:
   results = pool.map(urllib2.urlopen, urls);
except URLError:
   try:                              # try once more before logging error
      urllib2.urlopen(URLError.url);
   except URLError:                  # log error
      urllib2.urlopen("https://example.com/error/?url="+URLError.url);
为此:

results = pool.map(my_urlopen, urls);
Q.E.D