Python 在Pandas中使用SUMIF创建新行
如何使用pandas创建一个基于条件求和的新行 初始表-Python 在Pandas中使用SUMIF创建新行,python,pandas,Python,Pandas,如何使用pandas创建一个基于条件求和的新行 初始表- Product Date CAT Value Product A Apr F31 100 Product A Apr F32 200 Product A Apr F45 300 Product A Apr F46 400 Product A May F31 200 Product A May F32 300 Product A May F45 400 Pro
Product Date CAT Value
Product A Apr F31 100
Product A Apr F32 200
Product A Apr F45 300
Product A Apr F46 400
Product A May F31 200
Product A May F32 300
Product A May F45 400
Product A May F46 500
Product B Apr F31 200
Product B Apr F32 300
Product B Apr F45 400
Product B Apr F46 500
Product B May F31 600
Product B May F32 700
Product B May F45 800
Product B May F46 900
我想通过将F31和F32组合成F3来创建它,它应该给我值的总和
Product Date CAT Value
Product A Apr F3 300
Product A Apr F45 300
Product A Apr F46 400
Product A May F3 500
Product A May F45 400
Product A May F46 500
Product B Apr F3 500
Product B Apr F45 400
Product B Apr F46 500
Product B May F3 1300
Product B May F45 800
Product B May F46 900
你能帮我吗?首先让我们做一个目标正则表达式替换,只替换后面的数字
CAT A
df['CAT'] = df['CAT'].str.replace('(CAT A)(\d+)',r'\1')
因此CAT A5
-->CAT A
df['CAT'] = df['CAT'].str.replace('(CAT A)(\d+)',r'\1')
然后按顺序分组
df.groupby(['Product','Date','CAT'])['Value'].sum()
Product Date CAT
Product A Apr CAT A 300
CAT B 300
CAT C 400
Jul CAT C 500
Jun CAT B 400
May CAT A 500
Product B Apr CAT A 500
CAT B 400
CAT C 500
May CAT A 1300
CAT B 800
CAT C 900
Name: Value, dtype: int64
如果希望返回数据帧,请添加
.reset\u index()
。要创建如上所述的数据帧,我们需要应用两个操作
Data.groupby(['Product','Date','CAT'])['Value'].sum().reset_index(name='Value')
代码片段如下所示:
Product = ['Product A','Product A','Product A','Product A','Product A','Product A','Product A','Product A','Product B','Product B','Product B','Product B','Product B','Product B','Product B','Product B']
Date = ['Apr','Apr','Apr','Apr','May','May','May','May','Apr','Apr','Apr','Apr','May','May','May','May']
CAT = ['F31','F32','F45','F46','F31','F32','F45','F46','F31','F32','F45','F46','F31','F32','F45','F46']
Value = [100, 200,300,400,200,300,400,500,200,300,400,500,600,700,800,900]
# Creating Data Frame
Data = pd.DataFrame({'Product':Product,'Date':Date,'CAT':CAT,'Value':Value})
# String Replace
Data['CAT'] = Data['CAT'].replace('F31','F3')
Data['CAT'] = Data['CAT'].replace('F32','F3')
# Group By Operation
DataG = pd.DataFrame(Data.groupby(['Product','Date','CAT'])['Value'].sum().reset_index(name='Value'))
应用上述操作前的数据
应用上述操作后的数据
谢谢。如果它不是“CAT”并命名为其他名称,例如“type 1”和“segment 1”而不是“CAT A1”和“CAT A2”,代码应该如何工作?@Santoo
df['CAT'].str.replace('(type | segment)(\s+)(\d+),r'\1')
应该有效,但您应该更新样本以反映您的问题。31
是一个常量,您需要用3
替换吗?只是想确认一下,正如你下面的例子所说,你有两个不同的文本字段,请慢慢更新你的示例,我会在稍后的时间再次检查while@Manakin我想把F31和F32合并成F3。还有其他类别,如“Dep单一”和“D.经常性”、“D.合并”,必须合并为“Dep全部”。如果代码是动态的,那就行了。很抱歉,迟交的答复被拖到了会议上,这很难,因为我觉得您有很多需求,不能放在一行代码中。。您需要创建一个帮助器列来分组by@Manakin谢谢你回来。谢谢。是的,我现在已经选择了helper专栏路线。也许你问题的关键部分是系列。替换,但这里有一个涵盖groupby/sum
和其他用例的例子