Python 使用条件和的结果创建DataFrame列_Python_Pandas_Dataframe_Conditional

Python 使用条件和的结果创建DataFrame列

python pandas dataframe

Python 使用条件和的结果创建DataFrame列,python,pandas,dataframe,conditional,Python,Pandas,Dataframe,Conditional,关于从一个条件计算数据帧值的问题，我有一个更复杂的问题，关于为我正在努力解决的给定行包含一个基于该条件的和。以下是初始df： Key UID VID count month option unit year 0 1 5 100 1 A 10 2015 1 1 5 200 1 B 20 2015 2 1 5 300 2 A 30

关于从一个条件计算数据帧值的问题，我有一个更复杂的问题，关于为我正在努力解决的给定行包含一个基于该条件的和。以下是初始df：

Key UID VID count   month   option  unit    year
0   1   5   100     1       A       10      2015
1   1   5   200     1       B       20      2015
2   1   5   300     2       A       30      2015
3   1   5   400     2       B       40      2015
4   1   7   450     2       B       45      2015
5   1   5   500     3       B       50      2015

我希望遍历这个时间序列数据帧，为每行添加一列“unit\u count”，该列将“unit”的值除以当月的“count”之和，只有在选项为“B”的情况下。基本上：

df['unit_count'] = df['unit'] / sum of df['count'] for all records containing 'option' 'B' in the same month

将按如下方式附加数据帧：

Key UID VID count   month   option  unit    year    unit_count
0   1   5   100     1       A       10      2015    0.050
1   1   5   200     1       B       20      2015    0.100
2   1   5   300     2       A       30      2015    0.035
3   1   5   400     2       B       40      2015    0.047
4   1   7   450     2       B       45      2015    0.053
5   1   5   500     3       B       50      2015    0.100

上述示例df的代码为：

df = pd.DataFrame({'UID':[1,1,1,1,1,1],
                   'VID':[5,5,5,5,7,5],
                'year':[2015,2015,2015,2015,2015,2015],
                'month':[1,1,2,2,2,3],
                'option':['A','B','A','B','B','B'],
                'unit':[10,20,30,40,45,50],
                'count':[100,200,300,400,450,500]
                })

只想查看同一个月，因此您可以按

月

列分组，然后在每个组中使用

选项==“B”

将计数列子集并取和，使用和数值除以单位列（逻辑的转换）：

很好的解决方案！我认为使用

.loc[]

可以让它变得更好：

df.groupby（['year'，'month']）.apply（lambda g:g.unit/g.loc[g.option=='B'，'count'].sum（））

@MaxU我也有同样的感觉，但不知道它是否会更快，但更紧凑。@Psidom解决方案效果很好，尤其是在添加df.groupby时(@MaxU推荐的['year'，'month].@MaxU更紧凑的解决方案返回两个错误。

ValueError:缓冲区数据类型不匹配，预期为'Python object'，但得到'long'

和

TypeError:插入列的索引与框架索引不兼容

df['unit_count'] = df.groupby('month', group_keys=False).apply(
                      lambda g: g.unit/g['count'][g.option == "B"].sum())
df