Python 更改groupby和value_计数映射到数据帧的输出_Python_Pandas_Dataframe_Pandas Groupby

Python 更改groupby和value_计数映射到数据帧的输出

python pandas dataframe

Python 更改groupby和value_计数映射到数据帧的输出,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我有一个场景，我试图通过一个特定的值过滤一个数据帧，并计算另一个标识符出现的次数。然后我将它转换成一个字典并映射回数据帧。我遇到的问题是，生成的字典无法映射回数据帧，因为我给字典引入了复杂性（额外的键？），我不知道如何避免它我想一个简单的问题是：“我如何在我的单元格ID列上使用value_counts”，通过另一个称为Grid_Type的列进行过滤，并将结果映射回每个单元格ID的所有单元格？” 到目前为止我在做什么这可以计算有多少单元格包含单元格ID，但不允许按网格类型进行筛选 df['CE

我有一个场景，我试图通过一个特定的值过滤一个数据帧，并计算另一个标识符出现的次数。然后我将它转换成一个字典并映射回数据帧。我遇到的问题是，生成的字典无法映射回数据帧，因为我给字典引入了复杂性（额外的键？），我不知道如何避免它

我想一个简单的问题是：“我如何在我的单元格ID列上使用value_counts”，通过另一个称为Grid_Type的列进行过滤，并将结果映射回每个单元格ID的所有单元格？”

到目前为止我在做什么

这可以计算有多少单元格包含单元格ID，但不允许按网格类型进行筛选

df['CELL_ID'].value_counts()
z1 = z.to_dict()
df['CELL_CNT'] = df['CELL_ID'].map(z1)

此简单示例的字典输出如下所示：

7015988: 1, 7122961: 1, 6976792: 1

我的坏代码
这就是我到目前为止一直在做的工作——我希望能够返回计数，并通过Grid_类型进行过滤。例如，我希望能够计算在每个单元格ID中看到“点”的次数

z = df[df.Grid_Type == 'Spot'].groupby('CELL_ID')['Grid_Type'].value_counts()
z1 = z.to_dict()
df['SPOT_CNT'] = df['CELL_ID'].map(z1)

在我尝试过滤的示例中，字典似乎返回了一个更复杂的结果，其中包括Grid_类型。问题是，我只希望根据单元格ID映射计数。例如字典响应：

(7133691, 'Spot'): 3, (7133692, 'Spot'): 3, (7133693, 'Spot'): 2

示例数据

+---------+-----------+
| CELL_ID | Grid_Type |
+---------+-----------+
|     001 | Spot      |
|     001 | Square    |
|     001 | Spot      |
|     001 | Square    |
|     001 | Square    |
|     002 | Spot      |
|     002 | Square    |
|     002 | Square    |
|     003 | Square    |
|     003 | Spot      |
|     003 | Spot      |
|     003 | Spot      |
+---------+-----------+

期望的结果


+---------+-----------+----------+
| CELL_ID | Grid_Type | SPOT_CNT |
+---------+-----------+----------+
|     001 | Spot      |        2 |
|     001 | Square    |        2 |
|     001 | Spot      |        2 |
|     001 | Square    |        2 |
|     001 | Square    |        2 |
|     002 | Spot      |        1 |
|     002 | Square    |        1 |
|     002 | Square    |        1 |
|     003 | Square    |        3 |
|     003 | Spot      |        3 |
|     003 | Spot      |        3 |
|     003 | Spot      |        3 |
+---------+-----------+----------+

感谢您提供的任何帮助/

看来您已经找到了答案，但我将通过以下方式解决此问题：

df = pd.read_csv('spot.txt', sep=r"[ ]{1,}", engine='python', dtype='object')

print(df)

    CELL_ID Grid_Type
0   001 Spot
1   001 Square
2   001 Spot
3   001 Square
4   001 Square
5   002 Spot
6   002 Square
7   002 Square
8   003 Square
9   003 Spot
10  003 Spot
11  003 Spot

df_gb = df['Grid_Type'].groupby([df['CELL_ID']]).value_counts()

print(df_gb)

    CELL_ID  Grid_Type
001      Square       3
         Spot         2
002      Square       2
         Spot         1
003      Spot         3
         Square       1
Name: Grid_Type, dtype: int64



df_gb_dict = df_gb.to_dict()

count_list = []

for idx, row in df.iterrows():
    for k, v in df_gb_dict.items():
        if k[0] == row['CELL_ID'] and k[1] == row['Grid_Type'] and row['Grid_Type'] == 'Spot':
            count_list.append([k[0], k[1], v])
        if k[0] == row['CELL_ID'] and k[1] == row['Grid_Type'] and row['Grid_Type'] == 'Square':
            count_list.append([k[0], k[1], df_gb_dict[(row['CELL_ID'], 'Spot')]])


new_df = pd.DataFrame(count_list, columns=['CELL_ID',  'Grid_Type', 'SPOT_CNT'])

new_df.sort_values(by='CELL_ID', inplace=True)

new_df.reset_index(drop=True)

print(new_df)

  CELL_ID Grid_Type  SPOT_CNT
0      001      Spot         2
1      001    Square         2
2      001      Spot         2
3      001    Square         2
4      001    Square         2
5      002      Spot         1
6      002    Square         1
7      002    Square         1
8      003    Square         3
9      003      Spot         3
10     003      Spot         3
11     003      Spot         3

在

lambda

函数中：
-它返回bool if值（

）==

'Spot'

-对于每个组，

sum（）

将

True

bools相加
最后，根据文档，其行为如下：

DataFrame.transform(self, func, axis=0, *args, **kwargs) → 'DataFrame'[source]
     "Call func on self producing a DataFrame with transformed values."  
     "Produced DataFrame will have same axis length as self." <----
...

DataFrame.transform（self、func、axis=0、*args、**kwargs）→ '数据帧“[源]
“在使用转换后的值自行生成数据帧时调用func。”
“生成的数据帧将具有与自身相同的轴长度。”回答很好，谢谢。我想我能消化它并在很多地方使用。我还发现我能够使用z=df[df.Grid\u Type=='Spot'].groupby（'CELL\u ID'）['Grid\u Type'].count（）
获得计数。我认为value_counts（）使用的方法是错误的。我意识到你的问题只需要最后一列中的点计数，即使是正方形的行，所以我相应地编辑了我的答案。谢谢。这是否会创建第二个数据帧？在技术意义上是的，但我们将其分配给原始df
列'SPOT\u CNT'
df['SPOT_CNT'] = df.groupby('CELL_ID')['Grid_Type'].transform(lambda x: sum(x == 'Spot'))
print(df)

    CELL_ID Grid_Type  SPOT_CNT
0         1      Spot         2
1         1    Square         2
2         1      Spot         2
3         1    Square         2
4         1    Square         2
5         2      Spot         1
6         2    Square         1
7         2    Square         1
8         3    Square         3
9         3      Spot         3
10        3      Spot         3
11        3      Spot         3

DataFrame.transform(self, func, axis=0, *args, **kwargs) → 'DataFrame'[source]
     "Call func on self producing a DataFrame with transformed values."  
     "Produced DataFrame will have same axis length as self." <----
...