Python 使用Nan'；根据条件将列名指定为值；s_Python_Pandas_Dataframe_Numpy

Python 使用Nan'；根据条件将列名指定为值；s

python pandas dataframe numpy

Python 使用Nan'；根据条件将列名指定为值；s,python,pandas,dataframe,numpy,Python,Pandas,Dataframe,Numpy,我想在每一行中选取两个最大的值，对它们进行排序，并将列名作为值。其他值将从数据帧中删除 import pandas as pd d = {'col1': [1, 2, np.nan], 'col2': [2,3,3], 'col3': [3,6,5], 'col4': [4,9,10], 'col5': [5,1, np.nan], 'col6': [7,np.nan,2], 'col7': [np.nan, 5,6]} df = pd.DataFrame(data=d) 我现在可以得到每行的

我想在每一行中选取两个最大的值，对它们进行排序，并将列名作为值。其他值将从数据帧中删除

import pandas as pd
d = {'col1': [1, 2, np.nan], 'col2': [2,3,3], 'col3': [3,6,5], 'col4': [4,9,10], 'col5': [5,1, np.nan], 'col6': [7,np.nan,2], 'col7': [np.nan, 5,6]}
df = pd.DataFrame(data=d)

我现在可以得到每行的两个最大值，但基于列值重塑数据帧是另一项任务。下面的代码将其余值保留为Nan

lasttwo = df.stack().sort_values(ascending=True).groupby(level=0).tail(2).unstack()

下面是另一个线程的示例，示例代码如下所示，除了不使用Nan值外，它几乎可以正常工作

last = pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:2]]), axis=1).to_list(),  columns=['Last', 'Second last'])

如何处理这些问题

例如：

--- 可乐可乐可乐可乐可乐可乐6 可乐 A. 1. 2. 3. 4. 5. 7. 楠 B 2. 3. 6. 9 1. 楠 5. C 楠 3. 5. 10 楠 2. 6.

您可以使用其他替代解决方案：

lasttwo = df.apply(lambda x: pd.Series(x.nlargest(2).index[:2]), axis=1)
lasttwo.columns = ['Last',  'Second last']
print (lasttwo)
   Last Second last
0  col6        col5
1  col4        col3
2  col4        col7

或：

如果Performance很重要，则可以使用屏蔽numpy阵列：

a = df.to_numpy()
mask = np.isnan(a)
ma = np.ma.masked_array(a, mask=mask)
print (ma)
[[1.0 2.0 3.0 4.0 5.0 7.0 --]
 [2.0 3.0 6.0 9.0 1.0 -- 5.0]
 [-- 3.0 5.0 10.0 -- 2.0 6.0]]

arr = df.columns.to_numpy()[ma.argsort(endwith=False, axis=1)[:, ::-1][:, :2]]
lasttwo = pd.DataFrame(arr,  columns=['Last', 'Second last'])
print (lasttwo)
   Last Second last
0  col6        col5
1  col4        col3
2  col4        col7

我真的很喜欢第一个解决方案。它很紧凑。非常感谢。如果我认为这个解决方案是这样的，那么我的想法是否正确：-x.nlargest（2）.index[：2]@Eraseri-你完全正确。

a = df.to_numpy()
mask = np.isnan(a)
ma = np.ma.masked_array(a, mask=mask)
print (ma)
[[1.0 2.0 3.0 4.0 5.0 7.0 --]
 [2.0 3.0 6.0 9.0 1.0 -- 5.0]
 [-- 3.0 5.0 10.0 -- 2.0 6.0]]

arr = df.columns.to_numpy()[ma.argsort(endwith=False, axis=1)[:, ::-1][:, :2]]
lasttwo = pd.DataFrame(arr,  columns=['Last', 'Second last'])
print (lasttwo)
   Last Second last
0  col6        col5
1  col4        col3
2  col4        col7