Python 如何定位给定值的索引?
我想在从数据帧中删除一行时求解相关系数。然后在获得所有相关系数后,我需要删除导致相关系数最大增加的行Python 如何定位给定值的索引?,python,pandas,indexing,Python,Pandas,Indexing,我想在从数据帧中删除一行时求解相关系数。然后在获得所有相关系数后,我需要删除导致相关系数最大增加的行 length=len(df) def dropcc(df): df_temp=df.copy() idxmax=0 c=0 for i,v in df_temp.iterrows(): df_temp.drop([i], inplace=True) c_temp = correlation_coefficient_4u(df_te
length=len(df)
def dropcc(df):
df_temp=df.copy()
idxmax=0
c=0
for i,v in df_temp.iterrows():
df_temp.drop([i], inplace=True)
c_temp = correlation_coefficient_4u(df_temp.dist,df_temp.mps)
if c > c_temp:
idxmax=i
c=c_temp
df_temp=df.copy()
#print(round(c_temp,4))
df.drop([idxmax], inplace=True)
return df
for i in range(0, length-1):
cc=correlation_coefficient_4u(df.dist,df.mps)
if cc < -0.9:
break
else:
df=dropcc(df)
下面的代码显示了我的解决方案:
import pandas as pd
import numpy as np
#Access the data
file='tc_yolanda2.csv'
df = pd.read_csv(file)
x = df['dist']
y = df['mps']
#compute the correlation coefficient
def correlation_coefficient_4u(a,b):
correl_mat = np.corrcoef(a,b)
correlation = correl_mat[0,1]
return correlation
c = correlation_coefficient_4u(x,y)
print('Correlation coeffcient is:',c)
#Let us try this one
lenght = len(df)
print(lenght)
a = 0
while lenght != 0:
df.drop([a], inplace=True)
c = correlation_coefficient_4u(df.dist,df.mps)
a += 1
print(round(c,4))
它已经成功地生成了50个相关系数,但也产生了许多错误,例如
RuntimeWarning: Degrees of freedom <= 0 for slice
RuntimeWarning: divide by zero encountered in double_scalars
RuntimeWarning: invalid value encountered in multiply
RuntimeWarning: Mean of empty slice.
RuntimeWarning: invalid value encountered in true_divide
ValueError: labels [50] not contained in axis
RuntimeWarning:Degrees of freedom您可以使用以下代码查找并删除导致相关系数最大增加的行
length=len(df)
def dropcc(df):
df_temp=df.copy()
idxmax=0
c=0
for i,v in df_temp.iterrows():
df_temp.drop([i], inplace=True)
c_temp = correlation_coefficient_4u(df_temp.dist,df_temp.mps)
if c > c_temp:
idxmax=i
c=c_temp
df_temp=df.copy()
#print(round(c_temp,4))
df.drop([idxmax], inplace=True)
return df
for i in range(0, length-1):
cc=correlation_coefficient_4u(df.dist,df.mps)
if cc < -0.9:
break
else:
df=dropcc(df)
length=len(df)
def dropcc(df):
df_temp=df.copy()
idxmax=0
c=0
对于df_temp.iterrows()中的i,v:
df_温降([i],就地=真)
c_-temp=相关系数_4u(df_-temp.dist,df_-temp.mps)
如果c>c_温度:
idxmax=i
c=c_温度
df_temp=df.copy()
#打印(圆形(c_温度,4))
df.drop([idxmax],inplace=True)
返回df
对于范围(0,长度-1)内的i:
cc=相关系数(df.dist,df.mps)
如果cc<-0.9:
打破
其他:
df=dropcc(df)
Hi。请花点时间阅读这篇文章,以及如何提供答案,并相应地修改你的问题。这些提示可能也很有用。你做得很好。我的下一个问题是如何循环代码。删除导致相关系数(cc)最大变化的行会增加dist和mps之间的cc,因此,一旦它达到-0.90 cc,我需要停止它。
length=len(df)
def dropcc(df):
df_temp=df.copy()
idxmax=0
c=0
for i,v in df_temp.iterrows():
df_temp.drop([i], inplace=True)
c_temp = correlation_coefficient_4u(df_temp.dist,df_temp.mps)
if c > c_temp:
idxmax=i
c=c_temp
df_temp=df.copy()
#print(round(c_temp,4))
df.drop([idxmax], inplace=True)
return df
for i in range(0, length-1):
cc=correlation_coefficient_4u(df.dist,df.mps)
if cc < -0.9:
break
else:
df=dropcc(df)