Python 如何定位给定值的索引?

Python 如何定位给定值的索引?,python,pandas,indexing,Python,Pandas,Indexing,我想在从数据帧中删除一行时求解相关系数。然后在获得所有相关系数后,我需要删除导致相关系数最大增加的行 length=len(df) def dropcc(df): df_temp=df.copy() idxmax=0 c=0 for i,v in df_temp.iterrows(): df_temp.drop([i], inplace=True) c_temp = correlation_coefficient_4u(df_te

我想在从数据帧中删除一行时求解相关系数。然后在获得所有相关系数后,我需要删除导致相关系数最大增加的行

length=len(df)
def dropcc(df):
    df_temp=df.copy()
    idxmax=0
    c=0

    for i,v in df_temp.iterrows():
        df_temp.drop([i], inplace=True)
        c_temp = correlation_coefficient_4u(df_temp.dist,df_temp.mps)
        if c > c_temp:
            idxmax=i
            c=c_temp
        df_temp=df.copy()
        #print(round(c_temp,4))

    df.drop([idxmax], inplace=True)
    return df

for i in range(0, length-1):
    cc=correlation_coefficient_4u(df.dist,df.mps)
    if cc < -0.9:
        break
    else:
        df=dropcc(df)
下面的代码显示了我的解决方案:

import pandas as pd
import numpy as np

#Access the data

file='tc_yolanda2.csv'
df = pd.read_csv(file)

x = df['dist']
y = df['mps']

#compute the correlation coefficient

def correlation_coefficient_4u(a,b):
    correl_mat = np.corrcoef(a,b)
    correlation = correl_mat[0,1]
    return correlation

c = correlation_coefficient_4u(x,y)
print('Correlation coeffcient is:',c)

#Let us try this one

lenght = len(df)
print(lenght)
a = 0
while lenght != 0:
    df.drop([a], inplace=True)
    c = correlation_coefficient_4u(df.dist,df.mps)
    a += 1
    print(round(c,4))
它已经成功地生成了50个相关系数,但也产生了许多错误,例如

RuntimeWarning: Degrees of freedom <= 0 for slice

RuntimeWarning: divide by zero encountered in double_scalars

RuntimeWarning: invalid value encountered in multiply

RuntimeWarning: Mean of empty slice.

RuntimeWarning: invalid value encountered in true_divide

ValueError: labels [50] not contained in axis

RuntimeWarning:Degrees of freedom您可以使用以下代码查找并删除导致相关系数最大增加的行

length=len(df)
def dropcc(df):
    df_temp=df.copy()
    idxmax=0
    c=0

    for i,v in df_temp.iterrows():
        df_temp.drop([i], inplace=True)
        c_temp = correlation_coefficient_4u(df_temp.dist,df_temp.mps)
        if c > c_temp:
            idxmax=i
            c=c_temp
        df_temp=df.copy()
        #print(round(c_temp,4))

    df.drop([idxmax], inplace=True)
    return df

for i in range(0, length-1):
    cc=correlation_coefficient_4u(df.dist,df.mps)
    if cc < -0.9:
        break
    else:
        df=dropcc(df)
length=len(df)
def dropcc(df):
df_temp=df.copy()
idxmax=0
c=0
对于df_temp.iterrows()中的i,v:
df_温降([i],就地=真)
c_-temp=相关系数_4u(df_-temp.dist,df_-temp.mps)
如果c>c_温度:
idxmax=i
c=c_温度
df_temp=df.copy()
#打印(圆形(c_温度,4))
df.drop([idxmax],inplace=True)
返回df
对于范围(0,长度-1)内的i:
cc=相关系数(df.dist,df.mps)
如果cc<-0.9:
打破
其他:
df=dropcc(df)

Hi。请花点时间阅读这篇文章,以及如何提供答案,并相应地修改你的问题。这些提示可能也很有用。你做得很好。我的下一个问题是如何循环代码。删除导致相关系数(cc)最大变化的行会增加dist和mps之间的cc,因此,一旦它达到-0.90 cc,我需要停止它。
length=len(df)
def dropcc(df):
    df_temp=df.copy()
    idxmax=0
    c=0

    for i,v in df_temp.iterrows():
        df_temp.drop([i], inplace=True)
        c_temp = correlation_coefficient_4u(df_temp.dist,df_temp.mps)
        if c > c_temp:
            idxmax=i
            c=c_temp
        df_temp=df.copy()
        #print(round(c_temp,4))

    df.drop([idxmax], inplace=True)
    return df

for i in range(0, length-1):
    cc=correlation_coefficient_4u(df.dist,df.mps)
    if cc < -0.9:
        break
    else:
        df=dropcc(df)