如何使用python基于相关性统计优化分组?
我有一个在pandas相关矩阵中包含大约300个项目的python列表,对于每个项目,我希望通过编程将对应的10个相关性最低的项目与之分组,并将其存储在字典中 我不知道该怎么做,因为相关性从1到-1。如果我取最小的数字,我将取负相关最大的项目 我确定这是某种迭代器,但我不确定如何实现。这里是一个较小的相关矩阵示例,包含10如何使用python基于相关性统计优化分组?,python,pandas,optimization,Python,Pandas,Optimization,我有一个在pandas相关矩阵中包含大约300个项目的python列表,对于每个项目,我希望通过编程将对应的10个相关性最低的项目与之分组,并将其存储在字典中 我不知道该怎么做,因为相关性从1到-1。如果我取最小的数字,我将取负相关最大的项目 我确定这是某种迭代器,但我不确定如何实现。这里是一个较小的相关矩阵示例,包含10 import pandas as pd dict = {'index':['XBI','SDOW','IYG','DRIP','SCHV','TNA','SIL','IEM
import pandas as pd
dict = {'index':['XBI','SDOW','IYG','DRIP','SCHV','TNA','SIL','IEMG','GUSH','USL'],
'XBI':[1.000,-0.605,0.546,-0.424,0.610,0.716,0.215,0.485,0.453,0.265],
'SDOW':[-0.605,1.000,-0.890,0.554,-0.965,-0.871,-0.256,-0.772,-0.595,-0.429,],
'IYG':[0.546,-0.890,1.000,-0.567,0.918,0.838,0.197,0.701,0.603,0.325],
'DRIP':[-0.424,0.554,-0.567,1.000,-0.583,-0.609,-0.265,-0.530,-0.972,-0.686],
'SCHV':[0.610,-0.965,0.918,-0.583,1.000,0.893,0.276,0.768,0.624,0.431],
'TNA':[0.716,-0.871,0.838,-0.609,0.893,1.000,0.302,0.714,0.648,0.421],
'SIL':[0.215,-0.256,0.197,-0.265,0.276,0.302,1.000,0.317,0.227,0.308],
'IEMG':[0.485,-0.772,0.701,-0.530,0.768,0.714,0.317,1.000,0.567,0.399],
'GUSH':[0.453,-0.595,0.603,-0.972,0.624,0.648,0.227,0.567,1.000,0.675],
'USL':[0.265,-0.429,0.325,-0.686,0.431,0.421,0.308,0.399,0.675,1.000]}
matrix = pd.DataFrame.from_dict(dict)
matrix = matrix.set_index('index')
我们如何以这个小例子为例,为索引中的每个符号生成一个字典,作为一个键,值是由3个相关性最小的项组成的列表
前两个符号的最终结果如下所示:
{'XBI':['SIL'、'USL'、'DRIP']、'SDOW':['SIL'、'USL'、'DRIP']}
在前两个列表相同的情况下……您可以在
目录中使用abs
+nsmallest
:
dct = {c:matrix[c].abs().nsmallest(3).index.tolist() for c in matrix}
{'XBI': ['SIL', 'USL', 'DRIP'],
'SDOW': ['SIL', 'USL', 'DRIP'],
'IYG': ['SIL', 'USL', 'XBI'],
'DRIP': ['SIL', 'XBI', 'IEMG'],
'SCHV': ['SIL', 'USL', 'DRIP'],
'TNA': ['SIL', 'USL', 'DRIP'],
'SIL': ['IYG', 'XBI', 'GUSH'],
'IEMG': ['SIL', 'USL', 'XBI'],
'GUSH': ['SIL', 'XBI', 'IEMG'],
'USL': ['XBI', 'SIL', 'IYG']}