Python加速了重建列表的代码_Python_Performance_List

Python加速了重建列表的代码

python performance list

Python加速了重建列表的代码,python,performance,list,Python,Performance,List,我一直在尝试重建一些列表，例如： [42351, 4253, 1264, 5311, 3651] # The first number in a list is an ID [42352, 4254, 1244, 1246, 5311, 1264, 3651] [42353, 1254, 1264] 转换成如下格式： # ID \t 1 \t the_second_number_in_a_list \t ID \t 2 \t the_third_number_in_a_list \t ID

我一直在尝试重建一些列表，例如：

[42351, 4253, 1264, 5311, 3651]  # The first number in a list is an ID
[42352, 4254, 1244, 1246, 5311, 1264, 3651]
[42353, 1254, 1264]

转换成如下格式：

# ID \t 1 \t the_second_number_in_a_list \t ID \t 2 \t the_third_number_in_a_list \t ID \t 3 \t the_forth_number_in_a_list ...
42352   1   4254    42352   2   1244    42352   3   1246    42352   4   5311    42352   5   1264    42352   6   3651
42353   1   1254    42353   2   1264
42351   1   4253    42351   2   1264    42351   3   5311    42351   4   3651

我的想法是创建一个具有所需格式的中间词典：

list_dic = {42352: [42352, 1, 4254, 42352, 2, 1244, 42352, 3, 1246, 42352, 4, 5311, 42352, 5, 1264, 42352, 6, 3651], 42353: [42353, 1, 1254, 42353, 2, 1264], 42351: [42351, 1, 4253, 42351, 2, 1264, 42351, 3, 5311, 42351, 4, 3651]}

然后将其保存到一个由制表符分隔的txt文件中

然而，我意识到，在现实中，我可能有数十万个列表，而且我的方式会很慢，计算成本也会很高我正在寻找加快代码速度和减少整个过程所需内存的建议。谢谢

附件是我的代码：

seq1 = [42351, 4253, 1264, 5311, 3651]
seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
seq3 = [42353, 1254, 1264]

# First, group all information into a single list
seq_list = [seq1, seq2, seq3]

# Second, construct a dictionary to store all information
list_dic = {} 
for each_seq in seq_list:
    j = 1
    list_dic[each_seq[0]] = []
    for each_item in each_seq[1:]:
        list_dic[each_seq[0]].append(each_seq[0])
        list_dic[each_seq[0]].append(j)
        list_dic[each_seq[0]].append(each_item)
        j += 1

# Third, save the information into a txt file   
text_file = open("Output.txt", "w")
for each_id in list_dic:
    line = '\t'.join(str(each_num) for each_num in list_dic[each_id])
    text_file.write(line+'\n')
text_file.close()

据我所知，实际上没有任何理由创建中间字典

（也就是说，您现有的解决方案似乎也非常可行（尽管可能会慢一点）

对于@sirpasselot

>>> seq1 = [42351, 4253, 1264, 5311, 3651]
>>> seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
>>> seq3 = [42353, 1254, 1264]
>>> alllists = [seq1, seq2, seq3]
>>> for eachlist in alllists:
...     merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:])
...     print "\t".join( map(str,chain.from_iterable(merged)) )
...
42351   1       4253    42351   2       1264    42351   3       5311    42351    4       3651
42352   1       4254    42352   2       1244    42352   3       1246    42352    4       5311    42352   5       1264    42352   6       3651
42353   1       1254    42353   2       1264

据我所知，实际上没有任何理由创建中间字典

（也就是说，您现有的解决方案似乎也非常可行（尽管可能会慢一点）

对于@sirpasselot

>>> seq1 = [42351, 4253, 1264, 5311, 3651]
>>> seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
>>> seq3 = [42353, 1254, 1264]
>>> alllists = [seq1, seq2, seq3]
>>> for eachlist in alllists:
...     merged = zip(cycle([eachlist[0],]),count(1),eachlist[1:])
...     print "\t".join( map(str,chain.from_iterable(merged)) )
...
42351   1       4253    42351   2       1264    42351   3       5311    42351    4       3651
42352   1       4254    42352   2       1244    42352   3       1246    42352    4       5311    42352   5       1264    42352   6       3651
42353   1       1254    42353   2       1264

我假设您永远不会有两个或更多具有相同ID的列表，所以这里是我的代码

seq1 = [42351, 4253, 1264, 5311, 3651]
seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
seq3 = [42353, 1254, 1264]

# First, group all information into a single list
seq_list = [seq1, seq2, seq3]

# Second, put lists directly into text with desired format
text_file = open("Output.txt", "w")
for i in seq_list:
    for j in range(1,len(i)): #skip the first element and go to the end of the list
        text_file.write(str(i[0]) + '\t' + str(j) + '\t' + str(i[j]) + '\t')
    text_file.write('\n')
text_file.close()

它没有创建一个中间字典，而是直接将列表以您描述的格式放入文本文件中

我假设您永远不会有两个或多个具有相同ID的列表，因此下面是我的代码

seq1 = [42351, 4253, 1264, 5311, 3651]
seq2 = [42352, 4254, 1244, 1246, 5311, 1264, 3651]
seq3 = [42353, 1254, 1264]

# First, group all information into a single list
seq_list = [seq1, seq2, seq3]

# Second, put lists directly into text with desired format
text_file = open("Output.txt", "w")
for i in seq_list:
    for j in range(1,len(i)): #skip the first element and go to the end of the list
        text_file.write(str(i[0]) + '\t' + str(j) + '\t' + str(i[j]) + '\t')
    text_file.write('\n')
text_file.close()

它没有创建中间字典，而是直接将列表以您描述的格式放入文本文件中

这是一个不使用itertools的解决方案：

sqs = [
    [42351, 4253, 1264, 5311, 3651],
    [42352, 4254, 1244, 1246, 5311, 1264, 3651],
    [42353, 1254, 1264]
]

for sq in sqs:
    gen = ((sq[0], i, v) for i, v in enumerate(sq[1:], 1))
    print(' '.join([str(x) for sub in gen for x in sub]))

不使用itertools的解决方案：

sqs = [
    [42351, 4253, 1264, 5311, 3651],
    [42352, 4254, 1244, 1246, 5311, 1264, 3651],
    [42353, 1254, 1264]
]

for sq in sqs:
    gen = ((sq[0], i, v) for i, v in enumerate(sq[1:], 1))
    print(' '.join([str(x) for sub in gen for x in sub]))

你说它慢是什么意思？你真的分析过它了吗？dict lookup和list append是一个相当快的操作？虽然我可以看到它在内存方面有点昂贵，但我怀疑你是否会使用足够的RAM来实际导致任何问题

枚举和将项作为带制表符的字符串连接起来。我假设for循环中的for循环可能很慢。使用字典存储所有信息需要太多内存。但老实说，我还没有在实际数据上测试过它，因为我现在没有。谢谢Malik Brahimi！我将尝试枚举
！这是一个过早优化的教科书案例。你说它慢是什么意思？你真的知道吗lly分析了它？dict lookup和list append是一个非常快速的操作？虽然我可以看到它在内存方面有点昂贵，但我怀疑您是否会使用足够的RAM来实际导致任何问题枚举
和将
项作为带制表符的字符串连接。我假设，由于for循环中有一个for循环，它可能是s低。使用字典存储所有信息需要太多的内存。但老实说，我还没有在真实数据上测试它，因为我现在没有。谢谢Malik Brahimi！我会尝试枚举
！这是一个过早优化的教科书案例。你是对的，我数据中的ID是唯一的，很抱歉我忘了提到它。Y我们的答案对我非常有帮助，我非常感谢！谢谢。你说得对，我的数据中的ID是唯一的，很抱歉我忘了提及。你的答案对我非常有帮助，我非常感谢！谢谢。我没有得到所需的输出。它将所有列表放在一行。我正在使用alllist=[seq1，seq2，seq3]
合并我的列表。这不对吗？我想这是对的…我添加了我的示例，我只是将其打印到我的回答中。这很奇怪。我使用print/sys.stdout.write获得了正确的输出，但是当我写入文件时，它没有添加换行符。您确定您有f.write（“\n”）
？您是否在某些简陋的文本编辑器（如notepad.exe）中查看它，而这些编辑器可能不会以换行符的形式呈现？（请注意，如果您以非二进制模式打开文件（即使用w
而不是wb
），python将智能地为您编写适当的尾行字符（\r\n
）…）我没有得到所需的输出。它将所有列表放在一行上。我使用的是alllists=[seq1，seq2，seq3]
合并我的列表。这不对吗？我想这是对的…我添加了我的示例，我只是将其打印到我的回答中。这很奇怪。我使用print/sys.stdout.write获得了正确的输出，但是当我写入文件时，它没有添加换行符。您确定您有f.write（“\n”）
？您是否在某些简陋的文本编辑器（如notepad.exe）中查看它，而这些编辑器可能不会以换行符的形式呈现？（请注意，如果您以非二进制模式打开文件（即使用w
而不是wb
），python将智能地为您编写适当的尾行字符（\r\n
）…）