Python 如何获取表中最频繁的行_Python_Pandas_Numpy_Frequency_Mode

Python 如何获取表中最频繁的行

python pandas numpy

Python 如何获取表中最频繁的行,python,pandas,numpy,frequency,mode,Python,Pandas,Numpy,Frequency,Mode,如何获取数据帧中最频繁的行？例如，如果我有下表： col_1 col_2 col_3 0 1 1 A 1 1 0 A 2 0 1 A 3 1 1 A 4 1 0 B 5 1 0 C 预期结果： col_1 col_2 col_3 0 1 1 A 编辑：我需要最频繁的行（作为一个单元

如何获取数据帧中最频繁的行？例如，如果我有下表：

   col_1  col_2 col_3
0      1      1     A
1      1      0     A
2      0      1     A
3      1      1     A
4      1      0     B
5      1      0     C

预期结果：

   col_1  col_2 col_3
0      1      1     A

编辑：我需要最频繁的行（作为一个单元），而不是使用

mode（）

方法计算的最频繁的列值。

检查

groupby

df.groupby(df.columns.tolist()).size().sort_values().tail(1).reset_index().drop(0,1)
   col_1  col_2 col_3  
0      1      1     A

您可以使用groupby和size执行此操作：

df = df.groupby(df.columns.tolist(),as_index=False).size()
result = df.iloc[[df["size"].idxmax()]].drop(["size"], axis=1)
result.reset_index(drop=True) #this is just to reset the index

用NumPy的-

如果您希望获得性能，请将字符串列转换为数字，然后使用

np.unique

a = np.c_[df.col_1, df.col_2, pd.factorize(df.col_3)[0]]
u,idx,c = np.unique(a, axis=0, return_index=True, return_counts=True)

npi\u index

库帮助对“groupby”类型的问题执行一些操作，脚本更少，性能与

numpy

类似。因此，这是另一种与@Divakar基于

np.unique（）

的解决方案非常相似的方法：

arr = df.values.astype(str)
idx = npi.multiplicity(arr)
output = df.iloc[[idx[c.argmax()]]]

在熊猫1.1.0中。可以使用该方法对数据帧中的唯一行进行计数：

df.value_counts()

输出：

col_1  col_2  col_3
1      1      A        2
       0      C        1
              B        1
              A        1
0      1      A        1

   col_1  col_2 col_3
0      1      1     A

此方法可用于查找最频繁的行：

df.value_counts().head(1).index.to_frame(index=False)

输出：

col_1  col_2  col_3
1      1      A        2
       0      C        1
              B        1
              A        1
0      1      A        1

   col_1  col_2 col_3
0      1      1     A

你必须检查你的代码。如何获得

'size'

列？你是对的，我添加了“as_index=False”，在写下它时不知何故忽略了它。谢谢可选

df.groupby（df.columns.tolist（），as_index=False）.size（）.sort_值（'size'）.tail（1）.drop（'size'，1）