Python 如何计算一个数据帧中的分类出现次数并映射到另一个数据帧？_Python_Pandas_Dataframe

Python 如何计算一个数据帧中的分类出现次数并映射到另一个数据帧？

python pandas dataframe

Python 如何计算一个数据帧中的分类出现次数并映射到另一个数据帧？,python,pandas,dataframe,Python,Pandas,Dataframe,谢谢你能提供的任何帮助我有两个数据帧： df1 +-----+----------+ | key | category | +-----+----------+ | 1 | B | | 1 | A | | 1 | A | | 2 | C | | 2 | B | | 3 | C | | 3 | B | | 3 | C | | 4 | B

谢谢你能提供的任何帮助

我有两个数据帧：

df1
+-----+----------+
| key | category |
+-----+----------+
|   1 | B        |
|   1 | A        |
|   1 | A        |
|   2 | C        |
|   2 | B        |
|   3 | C        |
|   3 | B        |
|   3 | C        |
|   4 | B        |
|   4 | B        |
+-----+----------+

df2
+-----+----------+
| key | is_thing |
+-----+----------+
|   1 | yes      |
|   2 | yes      |
|   3 | yes      |
|   4 | no       |
+-----+----------+

我需要计算每个类别在每个键的

df1

中出现的次数，并将每个键的总和最高的类别映射到

df2

，这样没有多数的类别会导致

NaN

，所需的输出为：

+-----+----------+----------+
| key | is_thing | category |
+-----+----------+----------+
|   1 | yes      | A        |
|   2 | yes      | NaN|
|   3 | yes      | C        |
|   4 | no       | B        |
+-----+----------+----------+

如何使用Python和pandas实现这一点？可复制程序如下：

import pandas as pd

data1 = {'key': [1, 1, 1, 2, 2, 3, 3, 3, 4, 4], 
         'category': ['A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'B', 'B']}
data2 = {'key': [1, 2, 3, 4], 
         'is_thing': ['yes', 'yes', 'yes', 'no']}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

data_desired = {'key': [1, 2, 3, 4], 
                'is_thing': ['yes', 'yes', 'yes', 'no'],
                'cotegory': ['A', 'null', 'C', 'B']}

df_desired = pd.DataFrame(data_desired)

非常感谢提供的任何援助。谢谢。

以下是一种使用和执行此操作的方法：

使用：

一个键可以有两个最大计数相同的类别吗？@anky_91谢谢，是的，它可以，在这种情况下，所需的输出将为空。我们是否缺少

observation

列？为什么

null

而不是熊猫标准，

NaN

？@TuckDrace可能值得编辑这篇文章，并让回答者（？）知道。“非物质”是最酷的词，而不是“缺少数据”，我会记住；）在这种情况下，

的其他可能值是什么（顺便说一句，可以做成一张地图）？@AlexanderCécile，真的。它可以是一张地图

m=pd.crosstab(df1['key'],df1['category'])
cond=m.isin(m.max(1)).sum(1)

d=dict(zip(m.index,np.where(cond.eq(1),m.idxmax(1),np.nan)))
df2['category']=df2['key'].map(d)
#df_desired=df2.assign(category=df2['key'].map(d)) for a new df keeping df2 same
print(df2)

   key is_thing category
0    1      yes        A
1    2      yes      NaN
2    3      yes        C
3    4       no        B

new_df = pd.merge(df1, df2, how = 'left', left_on='key', right_on='key')

new_df.groupby(['key', 'is_thing'])['category'].agg(lambda s: s.mode()).map(lambda x: x if np.isscalar(x) else None)

>>> output  #  the index is (key, is_thing) (so reset it if you want).

1   yes A
2   yes 
3   yes C
4   no  B

df2['category']=df2['key'].map(

  df1.groupby('key')
     .category
     .value_counts()
     .groupby(level=0)
     .filter(lambda x: x.nunique() == len(x)) 
     .unstack()
     .idxmax(1)
)
print(df2)

   key is_thing category
0    1      yes        A
1    2      yes      NaN
2    3      yes        C
3    4       no        B