Python 使用R平方查找numpy数组中行之间的相关性
我试图在numpy数组中查找行之间的相关性,如果相关性高或等于0.85,则删除索引最低的行。 numpy阵列的示例:Python 使用R平方查找numpy数组中行之间的相关性,python,numpy,scipy,statistics,Python,Numpy,Scipy,Statistics,我试图在numpy数组中查找行之间的相关性,如果相关性高或等于0.85,则删除索引最低的行。 numpy阵列的示例: array =([[-0.90068117, 1.01900435, -1.34022653, -1.3154443 ], [-1.14301691, -0.13197948, -1.34022653, -1.3154443 ], [-1.38535265, 0.32841405, -1.39706395, -1
array =([[-0.90068117, 1.01900435, -1.34022653, -1.3154443 ],
[-1.14301691, -0.13197948, -1.34022653, -1.3154443 ],
[-1.38535265, 0.32841405, -1.39706395, -1.3154443 ],
[-1.50652052, 0.09821729, -1.2833891 , -1.3154443 ],
[-1.02184904, 1.24920112, -1.34022653, -1.3154443 ],
[-0.53717756, 1.93979142, -1.16971425, -1.05217993],
[-1.50652052, 0.78880759, -1.34022653, -1.18381211],
[-1.02184904, 0.78880759, -1.2833891 , -1.3154443 ],
[-1.74885626, -0.36217625, -1.34022653, -1.3154443 ],
[-1.14301691, 0.09821729, -1.2833891 , -1.44707648],
[-0.53717756, 1.47939788, -1.2833891 , -1.3154443 ],
[-1.26418478, 0.78880759, -1.22655167, -1.3154443 ],
[-1.26418478, -0.13197948, -1.34022653, -1.44707648],
[-1.87002413, -0.13197948, -1.51073881, -1.44707648],
[-0.05250608, 2.16998818, -1.45390138, -1.3154443 ],
[-0.17367395, 2.9 , -1.2833891 , -1.05217993],
[-0.53717756, 1.93979142, -1.39706395, -1.05217993],
[-0.90068117, 1.01900435, -1.34022653, -1.18381211],
[-0.17367395, 1.70959465, -1.16971425, -1.18381211],
[-0.90068117, 1.70959465, -1.2833891 , -1.18381211]])
所以我想检查第1->2行和第2->3行和第3->4行之间的相关性,如果它的I>=0.85,则删除最低索引中的行,因此我编写了以下代码:
raise2 = lambda element:element**2
def check_corr(array):
array = np.rot90(array)
r_value_list = []
for i in range(len(array)):
if i < 3:
a = stats.linregress(array[i],array[i+1])
r_value_list.append(a.rvalue)
i += 1
r_squared_list = list(map(raise2,r_value_list))
for i in r_squared_list:
if i >= 0.85:
b = r_squared_list.index(i)
array = np.delete(array,b,0)
array = np.rot90(array)
array = np.rot90(array)
array = np.rot90(array)
return array
clean_DATA = check_corr(no_outliers_DATA)
print(clean_DATA)
我想要获得的输出示例:
array = [[-0.90068117, 1.01900435, -1.3154443 ],
[-1.14301691, -0.13197948, -1.3154443 ],
[-1.38535265, 0.32841405, -1.3154443 ],
[-1.50652052, 0.09821729, -1.3154443 ],
[-1.02184904, 1.24920112, -1.3154443 ],
[-0.53717756, 1.93979142, -1.05217993],
[-1.50652052, 0.78880759, -1.18381211],
[-1.02184904, 0.78880759, -1.3154443 ],
[-1.74885626, -0.36217625, -1.3154443 ],
[-1.14301691, 0.09821729, -1.44707648],
[-0.53717756, 1.47939788, -1.3154443 ],
[-1.26418478, 0.78880759, -1.3154443 ],
[-1.26418478, -0.13197948, -1.44707648],
[-1.87002413, -0.13197948, -1.44707648],
[-0.05250608, 2.16998818, -1.3154443 ],
[-0.17367395, 2.9 , -1.05217993],
[-0.53717756, 1.93979142, -1.05217993],
[-0.90068117, 1.01900435, -1.18381211],
[-0.17367395, 1.70959465, -1.18381211],
[-0.90068117, 1.70959465, -1.18381211]])
其中第2行被删除,因为它与第3行相关。
另外,我希望该函数适用于大于4行的数组。
感谢您的帮助请提供所需信息(MRE)。我们应该能够复制和粘贴一个连续的代码块,执行该文件,并再现您的问题以及跟踪问题点的输出。这让我们可以根据您的测试数据和期望的输出来测试我们的建议。显示中间结果与您预期的不同之处。我们希望您执行基本诊断,包括在您的帖子中。至少,在错误点打印可疑值,并将其追溯到其来源。在许多情况下,执行此基本诊断将向您显示问题所在,您根本不需要堆栈溢出。我认为您所有的问题都在索引和
I+=1
行中。删除i+=1
,因为python会自动增加它。此外,for循环将在i=3处结束,这是数组的最大索引,因此a=stats.linregresse(数组[i],数组[i+1])
行将失败。对于范围内的i(len(array)-1),最好使用:
@Prune抱歉,这一切都是全新的。我编辑了它differently@OliverMohrBonometti那确实有用!谢谢
array = [[-0.90068117, 1.01900435, -1.3154443 ],
[-1.14301691, -0.13197948, -1.3154443 ],
[-1.38535265, 0.32841405, -1.3154443 ],
[-1.50652052, 0.09821729, -1.3154443 ],
[-1.02184904, 1.24920112, -1.3154443 ],
[-0.53717756, 1.93979142, -1.05217993],
[-1.50652052, 0.78880759, -1.18381211],
[-1.02184904, 0.78880759, -1.3154443 ],
[-1.74885626, -0.36217625, -1.3154443 ],
[-1.14301691, 0.09821729, -1.44707648],
[-0.53717756, 1.47939788, -1.3154443 ],
[-1.26418478, 0.78880759, -1.3154443 ],
[-1.26418478, -0.13197948, -1.44707648],
[-1.87002413, -0.13197948, -1.44707648],
[-0.05250608, 2.16998818, -1.3154443 ],
[-0.17367395, 2.9 , -1.05217993],
[-0.53717756, 1.93979142, -1.05217993],
[-0.90068117, 1.01900435, -1.18381211],
[-0.17367395, 1.70959465, -1.18381211],
[-0.90068117, 1.70959465, -1.18381211]])