Python Google应用程序引擎：如何使用TaskQueue或异步Urlfetch并行下载？_Python_Google App Engine_Urlfetch

Python Google应用程序引擎：如何使用TaskQueue或异步Urlfetch并行下载？

python google-app-engine

Python Google应用程序引擎：如何使用TaskQueue或异步Urlfetch并行下载？,python,google-app-engine,urlfetch,Python,Google App Engine,Urlfetch,我的Gae应用程序从第三方站点检索JSON数据；给定一个表示要下载的项目的ID，此站点上的项目数据被组织在多个页面中，因此我的代码必须一页接一页地下载数据块，直到检索到最后一个可用页面的数据。我的简化代码如下所示： class FetchData(webapp.RequestHandler): def get(self): ... data_list = [] page = 1 while True: fetched_data= urlfetch

我的Gae应用程序从第三方站点检索JSON数据；给定一个表示要下载的项目的ID，此站点上的项目数据被组织在多个页面中，因此我的代码必须一页接一页地下载数据块，直到检索到最后一个可用页面的数据。
我的简化代码如下所示：

class FetchData(webapp.RequestHandler):
  def get(self):
    ...
    data_list = []
    page = 1
    while True:
      fetched_data= urlfetch.fetch('http://www.foo.com/getdata?id=xxx&result=JSON&page=%s' % page)
      data_chunk = fetched_data["data"] 
      data_list = data_list + data_chunk
      if len(data_list) == int(fetched_data["total_pages"]):
         break
      else:
         page = page +1 
    ...  
    doRender('dataview.htm',{'data_list':data_list} )

数据列表

结果是一个有序列表，其中第一个项目具有第1页的数据，最后一个项目具有最新页的数据；此

数据列表

，一旦检索，将在视图中呈现

这种方法在99%的情况下有效，但有时，由于Google App Engine施加的30秒限制，在有很多页面的项目上，我会遇到可怕的

死线超出derror

。我想知道，如果使用| |我是否可以以某种方式改进并行N urlfetch调用的算法。

使用以下方法：

这很简单，就像这样：

def handle_result(rpc):
    result = rpc.get_result()
    # ... Do something with result...

# Use a helper function to define the scope of the callback.
def create_callback(rpc):
    return lambda: handle_result(rpc)

rpcs = []
for url in urls:
    rpc = urlfetch.create_rpc()
    rpc.callback = create_callback(rpc)
    urlfetch.make_fetch_call(rpc, url)
    rpcs.append(rpc)

# ...

# Finish all RPCs, and let callbacks process the results.
for rpc in rpcs:
    rpc.wait()

我已决定：

chunks_dict = {}

def handle_result(rpc, page):
    result = rpc.get_result()
    chunks_dict[page] = result["data"]

def create_callback(rpc, page):
    return lambda: handle_result(rpc, page)

rpcs = []
while True:
    rpc = urlfetch.create_rpc(deadline = 10)
    rpc.callback = create_callback(rpc, page)
    urlfetch.make_fetch_call(rpc, 'http://www.foo.com/getdata?id=xxx&result=JSON&page=%s' % page)
    rpcs.append(rpc)
    if page > total_pages:
       break
    else:
       page = page +1   
for rpc in rpcs:
    rpc.wait()

page_keys = chunks_dict.keys()
page_keys.sort()
for key in page_keys:
    data_list= data_list + chunks_dict[key]

让我知道，如果下面的解决方案对你有效，它在哪里就好。只需将while部分替换为上面的代码，并根据需要进行修改。不需要全局。虽然不是很详细，但您的回答帮助我关注异步解决方案。