如何使用python基于相关性统计优化分组?

如何使用python基于相关性统计优化分组?,python,pandas,optimization,Python,Pandas,Optimization,我有一个在pandas相关矩阵中包含大约300个项目的python列表,对于每个项目,我希望通过编程将对应的10个相关性最低的项目与之分组,并将其存储在字典中 我不知道该怎么做,因为相关性从1到-1。如果我取最小的数字,我将取负相关最大的项目 我确定这是某种迭代器,但我不确定如何实现。这里是一个较小的相关矩阵示例,包含10 import pandas as pd dict = {'index':['XBI','SDOW','IYG','DRIP','SCHV','TNA','SIL','IEM

我有一个在pandas相关矩阵中包含大约300个项目的python列表,对于每个项目,我希望通过编程将对应的10个相关性最低的项目与之分组,并将其存储在字典中

我不知道该怎么做,因为相关性从1到-1。如果我取最小的数字,我将取负相关最大的项目

我确定这是某种迭代器,但我不确定如何实现。这里是一个较小的相关矩阵示例,包含10

import pandas as pd

dict = {'index':['XBI','SDOW','IYG','DRIP','SCHV','TNA','SIL','IEMG','GUSH','USL'],
'XBI':[1.000,-0.605,0.546,-0.424,0.610,0.716,0.215,0.485,0.453,0.265],
'SDOW':[-0.605,1.000,-0.890,0.554,-0.965,-0.871,-0.256,-0.772,-0.595,-0.429,],
'IYG':[0.546,-0.890,1.000,-0.567,0.918,0.838,0.197,0.701,0.603,0.325],
'DRIP':[-0.424,0.554,-0.567,1.000,-0.583,-0.609,-0.265,-0.530,-0.972,-0.686],
'SCHV':[0.610,-0.965,0.918,-0.583,1.000,0.893,0.276,0.768,0.624,0.431],
'TNA':[0.716,-0.871,0.838,-0.609,0.893,1.000,0.302,0.714,0.648,0.421],
'SIL':[0.215,-0.256,0.197,-0.265,0.276,0.302,1.000,0.317,0.227,0.308],
'IEMG':[0.485,-0.772,0.701,-0.530,0.768,0.714,0.317,1.000,0.567,0.399],
'GUSH':[0.453,-0.595,0.603,-0.972,0.624,0.648,0.227,0.567,1.000,0.675],
'USL':[0.265,-0.429,0.325,-0.686,0.431,0.421,0.308,0.399,0.675,1.000]}

matrix = pd.DataFrame.from_dict(dict)
matrix = matrix.set_index('index')
我们如何以这个小例子为例,为索引中的每个符号生成一个字典,作为一个键,值是由3个相关性最小的项组成的列表

前两个符号的最终结果如下所示:

{'XBI':['SIL'、'USL'、'DRIP']、'SDOW':['SIL'、'USL'、'DRIP']}


在前两个列表相同的情况下……

您可以在
目录中使用
abs
+
nsmallest

dct = {c:matrix[c].abs().nsmallest(3).index.tolist() for c in matrix}

{'XBI': ['SIL', 'USL', 'DRIP'],
 'SDOW': ['SIL', 'USL', 'DRIP'],
 'IYG': ['SIL', 'USL', 'XBI'],
 'DRIP': ['SIL', 'XBI', 'IEMG'],
 'SCHV': ['SIL', 'USL', 'DRIP'],
 'TNA': ['SIL', 'USL', 'DRIP'],
 'SIL': ['IYG', 'XBI', 'GUSH'],
 'IEMG': ['SIL', 'USL', 'XBI'],
 'GUSH': ['SIL', 'XBI', 'IEMG'],
 'USL': ['XBI', 'SIL', 'IYG']}