使用Python从列表和词汇表构建数组_Python_Arrays_List_Dictionary_Matrix

使用Python从列表和词汇表构建数组

python arrays list dictionary matrix

使用Python从列表和词汇表构建数组,python,arrays,list,dictionary,matrix,Python,Arrays,List,Dictionary,Matrix,我试图用一个列表构建一个矩阵，然后用dict的值填充它。它可以处理小数据，但当使用大数据时（没有足够的Ram），计算机会崩溃。我的脚本显然太重了，但我不知道如何改进它（第一次在编程中）。谢谢非常感谢您的回答，在dico中使用或列表理解确实提高了脚本的速度，这非常有帮助。但我的问题似乎是由以下功能引起的： def clustering(matrix, liste_globale_occurences, output2): most_common_groups = [] Y =

我试图用一个列表构建一个矩阵，然后用dict的值填充它。它可以处理小数据，但当使用大数据时（没有足够的Ram），计算机会崩溃。我的脚本显然太重了，但我不知道如何改进它（第一次在编程中）。谢谢

非常感谢您的回答，在dico中使用

或列表理解确实提高了脚本的速度，这非常有帮助。
但我的问题似乎是由以下功能引起的：
def clustering(matrix, liste_globale_occurences, output2):
    most_common_groups = []
    Y = scipy.spatial.distance.pdist(matrix)
    Z = scipy.cluster.hierarchy.linkage(Y,'average', 'euclidean')
    scipy.cluster.hierarchy.dendrogram(Z)
    clust_h = scipy.cluster.hierarchy.fcluster(Z, t = 15, criterion='distance')
    print clust_h
    print len(clust_h)
    most_common = collections.Counter(clust_h).most_common(3)
    group1 = most_common[0][0]
    group2 = most_common[1][0]
    group3 = most_common[2][0]
    most_common_groups.append(group1)
    most_common_groups.append(group2)
    most_common_groups.append(group3)
    with open(output2, 'w') as results: # here the begining of the problem 
        for group in most_common_groups: 
            for i, val in enumerate(clust_h):
                if group == val:
                    mise_en_page = "{0:36s} groupe co-occurences = {1:5s} \n"
                    results.write(mise_en_page.format(str(liste_globale_occurences[i]),str(val)))

使用小文件时，我会得到正确的结果，例如：
联系a=第2组
联系人b=第2组
联系人c=第2组
联系人d=第2组
联系人e=第3组
联系人f=第3组
但是，当使用重文件时，我每个组只得到一个示例：
联系a=第2组
联系a=第2组
联系a=第2组
联系a=第2组
联系人e=第3组
联系人e=第3组
您可以创建一个矩阵mat=len（liste）*len（liste）的零，并遍历dico和拆分键：“/”之前的val将是行数，“/”之后的val将是列数。这样，您就不需要使用“has_key”搜索功能。
您的问题看起来像一个O（n2），因为您需要从列表中获取所有组合。所以你必须有一个内环
您可以尝试将每一行写入一个文件，然后在随后的新过程中，从该文件创建矩阵。新进程将使用更少的内存，因为它不必存储大量的liste
和dico
输入。比如说：
def make_array(liste,dico):
    f = open('/temp/matrix.txt', 'w')
    for i in liste:
        for j in liste:
            # This is just short circuit evaluation of logical or. It gets the first value that's not nothing
            f.write('%s ' % (dico.get(i+"/"+j) or dico.get(j+"/"+i) or 0))
        f.write('\n')
    f.close()
    return

一旦执行了，您就可以调用
print np.loadtxt('/temp/matrix.txt', dtype=int)

我使用了短路评估来减少if
语句的代码行数。事实上，如果使用，您可以将make_array
函数缩减为：
def make_array(liste,dico):
    return np.array([[dico.get(i+"/"+j) or dico.get(j+"/"+i) or 0 for j in liste] for i in liste])

你能解释更多关于用一个列表构建一个矩阵，然后用dict的值填充它吗。？也许可以展示一个最小的例子！不要使用has_key
它在2.7中被弃用，在3中被删除，在dico中使用
def make_array(liste,dico):
    return np.array([[dico.get(i+"/"+j) or dico.get(j+"/"+i) or 0 for j in liste] for i in liste])