Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/url/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用pandas按列返回n个最小索引_Python_Pandas - Fatal编程技术网

Python 使用pandas按列返回n个最小索引

Python 使用pandas按列返回n个最小索引,python,pandas,Python,Pandas,我有以下(简化的)数据帧: df = pd.DataFrame({'X': [1, 2, 3, 4, 5,6,7,8,9,10], 'Y': [10,20,30,40,50,-10,-20,-30,-40,-50], 'Z': [20,18,16,14,12,10,8,6,4,2]},index=list('ABCDEFGHIJ')) 其中给出了以下内容: X Y Z A 1 10 20 B 2 20 18 C 3 30 16 D 4 40 14

我有以下(简化的)数据帧:

df = pd.DataFrame({'X': [1, 2, 3, 4, 5,6,7,8,9,10],
'Y': [10,20,30,40,50,-10,-20,-30,-40,-50],
'Z': [20,18,16,14,12,10,8,6,4,2]},index=list('ABCDEFGHIJ'))
其中给出了以下内容:

    X   Y   Z
A   1  10  20
B   2  20  18
C   3  30  16
D   4  40  14
E   5  50  12
F   6 -10  10
G   7 -20   8
H   8 -30   6
I   9 -40   4
J  10 -50   2
我想创建一个新的dataframe,它按列返回n个最小值的索引

期望输出(例如,3个最小值):


执行此操作的最佳方法是什么?

您可以将
应用于:


更快的numpy解决方案,包括:

计时

In [111]: %timeit (pd.DataFrame(df.index[np.argsort(-df.values, axis=0)[-1:-1-N:-1]], columns=df.columns))
159 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [112]: %timeit (df.apply(lambda x: pd.Series(x.nsmallest(N).index)))
3.52 ms ± 49.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

首先,您希望对每列的输入数据帧进行排序,然后获取每列的所有索引的列表,从这些索引创建一个数据帧,然后从结果数据帧返回前n行

def topN(df, n):
#first, sort dataframe per column
sort_x = df.sort_values(by = ['X'], ascending = True)
sort_y = df.sort_values(by = ['Y'], ascending = True)
sort_z = df.sort_values(by = ['Z'], ascending = True)
#now get a list of the indices of each sorted df
index_list_x = sort_x.index.values.tolist()
index_list_y = sort_y.index.values.tolist()
index_list_z = sort_z.index.values.tolist()
#create dataframe from lists
sorted_df = pd.DataFrame(
    {'sorted_x':index_list_x,
     'sorted_y':index_list_y,
     'sorted_z':index_list_z  
    })
#return the top n from the sorted dataframe
return sorted_df.iloc[0:n]

topN(df,3)
返回:

  X  Y  Z
0 A  J  J
1 B  I  I
2 C  H  H
In [111]: %timeit (pd.DataFrame(df.index[np.argsort(-df.values, axis=0)[-1:-1-N:-1]], columns=df.columns))
159 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [112]: %timeit (df.apply(lambda x: pd.Series(x.nsmallest(N).index)))
3.52 ms ± 49.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
def topN(df, n):
#first, sort dataframe per column
sort_x = df.sort_values(by = ['X'], ascending = True)
sort_y = df.sort_values(by = ['Y'], ascending = True)
sort_z = df.sort_values(by = ['Z'], ascending = True)
#now get a list of the indices of each sorted df
index_list_x = sort_x.index.values.tolist()
index_list_y = sort_y.index.values.tolist()
index_list_z = sort_z.index.values.tolist()
#create dataframe from lists
sorted_df = pd.DataFrame(
    {'sorted_x':index_list_x,
     'sorted_y':index_list_y,
     'sorted_z':index_list_z  
    })
#return the top n from the sorted dataframe
return sorted_df.iloc[0:n]

topN(df,3)
  X  Y  Z
0 A  J  J
1 B  I  I
2 C  H  H