获取URL时的多处理python 3.2_Python_Api_Url_Multiprocessing_Steam

获取URL时的多处理python 3.2

python api url

获取URL时的多处理python 3.2,python,api,url,multiprocessing,steam,Python,Api,Url,Multiprocessing,Steam,我已经编写了一个脚本来从Steam API获取库存数据，我对速度有点不满意。因此，我读了一些关于python中的多处理的书，但我对它一无所知。该程序的工作原理如下：它从列表中获取SteamID，获取库存，然后将SteamID和库存添加到字典中，ID作为键，库存内容作为值我还了解到，在进行多处理时使用计数器会涉及一些问题，这是一个小问题，因为我希望能够从上次获取的库存中恢复程序，而不是从头开始无论如何，我想要的是一个具体的例子，说明如何在打开包含清单数据的URL时进行多重处理，以便程序一次可以

我已经编写了一个脚本来从Steam API获取库存数据，我对速度有点不满意。因此，我读了一些关于python中的多处理的书，但我对它一无所知。该程序的工作原理如下：它从列表中获取SteamID，获取库存，然后将SteamID和库存添加到字典中，ID作为键，库存内容作为值

我还了解到，在进行多处理时使用计数器会涉及一些问题，这是一个小问题，因为我希望能够从上次获取的库存中恢复程序，而不是从头开始

无论如何，我想要的是一个具体的例子，说明如何在打开包含清单数据的URL时进行多重处理，以便程序一次可以获取多个清单，而不是一个清单

关于代码：

with open("index_to_name.json", "r", encoding=("utf-8")) as fp:
    index_to_name=json.load(fp)

with open("index_to_quality.json", "r", encoding=("utf-8")) as fp:
    index_to_quality=json.load(fp)

with open("index_to_name_no_the.json", "r", encoding=("utf-8")) as fp:
    index_to_name_no_the=json.load(fp)

with open("steamprofiler.json", "r", encoding=("utf-8")) as fp:
    steamprofiler=json.load(fp)

with open("itemdb.json", "r", encoding=("utf-8")) as fp:
    players=json.load(fp)

error=list()
playerinventories=dict()
c=127480

while c<len(steamprofiler):
    inventory=dict()
    items=list()
    try:
        url=urllib.request.urlopen("http://api.steampowered.com/IEconItems_440/GetPlayerItems/v0001/?key=DD5180808208B830FCA60D0BDFD27E27&steamid="+steamprofiler[c]+"&format=json")
        inv=json.loads(url.read().decode("utf-8"))
        url.close()
    except (urllib.error.HTTPError, urllib.error.URLError, socket.error, UnicodeDecodeError) as e:
        c+=1
        print("HTTP-error, continuing")
        error.append(c)
        continue
    try:
        for r in inv["result"]["items"]:
            inventory[r["id"]]=r["quality"], r["defindex"]
    except KeyError:
        c+=1
        error.append(c)
        continue
    for key in inventory:
        try:
            if index_to_quality[str(inventory[key][0])]=="":
                items.append(
                    index_to_quality[str(inventory[key][0])]
                    +""+
                    index_to_name[str(inventory[key][1])]
                    )
            else:
                items.append(
                    index_to_quality[str(inventory[key][0])]
                    +" "+
                    index_to_name_no_the[str(inventory[key][1])]
                    )
        except KeyError:
            print("keyerror, uppdate def_to_index")
            c+=1
            error.append(c)
            continue
    playerinventories[int(steamprofiler[c])]=items
    c+=1
    if c % 10==0:
        print(c, "inventories downloaded")

打开（“index_to_name.json”，“r”，encoding=（“utf-8”））作为fp:
索引到名称=json.load（fp）
将open（“index_to_quality.json”，“r”，encoding=（“utf-8”）作为fp:
索引到质量=json.load（fp）
以open（“index_to_name_no_the.json”，“r”，encoding=（“utf-8”）作为fp:
索引到\u name\u no\u=json.load（fp）
将open（“streamprofiler.json”，“r”，encoding=（“utf-8”）作为fp:
streamprofiler=json.load（fp）
将open（“itemdb.json”，“r”，encoding=（“utf-8”）作为fp:
players=json.load（fp）
错误=列表（）
playerinventories=dict（）
c=127480
而c最外层的while循环似乎分布在几个进程（或任务）上
当您将循环分解为任务时，请注意，您正在进程之间共享playeriventories
和error
对象。对于共享问题，您需要使用多处理.Manager

我建议您从开始修改代码。
那么您认为获取URL可能会使您的程序变慢？您最好先检查一下这个假设，但如果确实是这样，那么使用多处理
模块是一个巨大的过度使用：对于I/O瓶颈，线程非常简单，甚至可能更快一些（生成另一个python解释器比生成线程需要更多的时间）
查看您的代码，您可能会将while循环的大部分内容粘贴在一个函数中，并将c
作为参数，然后使用另一个函数从那里开始一个线程，例如：
def process_item(c):
    # The work goes here
    # Replace al those 'continue' statements with 'return'

for c in range(127480, len(steamprofiler)):
    thread = threading.Thread(name="inventory {0}".format(c), target=process_item, args=[c])
    thread.start()

真正的问题可能是对生成的线程数量没有限制，这可能会破坏程序。此外，Steam的员工可能不会对你的剧本感到好笑，他们可能会决定取消你的好友关系
更好的方法是用c
的列表填充collections.deque
对象，然后启动一组有限的线程来完成这项工作：
def process_item(c):
    # The work goes here
    # Replace al those 'continue' statements with 'return'

def process():
    while True:
       process_item(work.popleft())

work = collections.deque(range(127480, len(steamprofiler)))

threads = [threading.Thread(name="worker {0}".format(n), target=process)
                   for n in range(6)]
for worker in threads:
    worker.start()

请注意，我指望work.popleft（）
在我们没有工作时抛出一个索引器，这将杀死线程。这有点鬼鬼祟祟，所以考虑使用<代码>尝试……除了< /C> >。< /P>
还有两件事：
考虑使用优秀的库而不是urllib
（就API而言，它是我使用过的整个Python标准库中最糟糕的模块）
对于请求，有一个名为的附加组件，它允许您执行完全异步的HTTP请求。这将使代码更加简单
我希望这会有所帮助，但请记住这都是未经测试的代码。
基本上分为打开url->从url获取数据->翻译数据->将其添加到字典中。瓶颈在于打开url和获取数据。你能为我的代码提供一些更具体的帮助吗？想想一个函数，它将计数器
作为输入（c
，在现有代码中）并更新playeriventories
和error
。该函数可以同时在多个进程上运行，以提高网络利用率。很抱歉，我的拙劣表达“分隔为”。请将其视为“分发”