Python 熊猫数据框根据条件用不同的列值替换列的最大值
我有一个包含多列的数据框,我只想将“视图”列的最大值替换为基于特定条件的三个不同列Python 熊猫数据框根据条件用不同的列值替换列的最大值,python,pandas,Python,Pandas,我有一个包含多列的数据框,我只想将“视图”列的最大值替换为基于特定条件的三个不同列 import pandas as PD data = [["1.Blend Of Vdx Display","DISPLAY","Features","CPE",1255,778732,13373,7142],["1.Blend Of Vdx Display","DISPLAY","TVC","CPE",10479,778732,13373,7142], ["2.Mobile VDX","Dis
import pandas as PD
data = [["1.Blend Of Vdx Display","DISPLAY","Features","CPE",1255,778732,13373,7142],["1.Blend Of Vdx Display","DISPLAY","TVC","CPE",10479,778732,13373,7142],
["2.Mobile VDX","Display","Features","CPE",168,1000,150,160],["2.Mobile VDX","Display","Features","CPE",2309,1000,150,160]]
df = pd.DataFrame(data,columns=['Placement#Name','PRODUCT','VIDEONAME','COST_TYPE',views','IMPRESSIONS','ENGAGEMENTS','DPEENGAMENTS'])
print(df)
Placement#Name PRODUCT VIDEONAME COST_TYPE views IMPRESSIONS \
0 1.Blend Of Vdx Display DISPLAY Features CPE 1255 778732
1 1.Blend Of Vdx Display DISPLAY TVC CPE 10479 778732
2 2.Mobile VDX DISPLAY Features CPE 168 1000
3 2.Mobile VDX DISPLAY Features CPE 2309 1000
ENGAGEMENTS DPEENGAMENTS
0 13373 7142
1 13373 7142
2 150 160
3 150 160
我可以通过这样做过滤掉最大值
newdf = df.loc[df.reset_index().groupby(['Placement#Name'])['Views'].idxmax()]
print (newdf)
Placement#Name PRODUCT VIDEONAME COST_TYPE Views IMPRESSIONS \
1 1.Blend Of Vdx Display DISPLAY TVC CPE 10479 778732
3 2.Mobile VDX DISPLAY Features CPE 2309 1000
ENGAGEMENTS DPEENGAMENTS
1 13373 7142
3 150 160
现在我想用10479和2309等条件替换newdf视图,可以用Engagements列替换,因为条件Product是display,Cost_Type是CPE
因此,新的df输出是
print (newdf)
Placement#Name PRODUCT VIDEONAME COST_TYPE Views IMPRESSIONS \
1 1.Blend Of Vdx Display DISPLAY TVC CPE 13373 778732
3 2.Mobile VDX DISPLAY Features CPE 150 1000
ENGAGEMENTS DPEENGAMENTS
1 13373 7142
3 150 160
然后我想在原始df上转换这个。
所以原始输出是:
print (df)
Placement#Name PRODUCT VIDEONAME COST_TYPE views IMPRESSIONS \
0 1.Blend Of Vdx Display DISPLAY Features CPE 1255 778732
1 1.Blend Of Vdx Display DISPLAY TVC CPE 13373 778732
2 2.Mobile VDX DISPLAY Features CPE 168 1000
3 2.Mobile VDX DISPLAY Features CPE 150 1000
ENGAGEMENTS DPEENGAMENTS
0 13373 7142
1 13373 7142
2 150 160
3 150 160
我认为需要:
newdf = df.loc[df.reset_index().groupby(['Placement#Name'])['Views'].idxmax()]
#filter by conditions
mask = (newdf.PRODUCT.str.upper() == 'DISPLAY') & (newdf.COST_TYPE == 'CPE')
newdf.loc[mask, 'Views'] = newdf['ENGAGEMENTS']
print (newdf)
Placement#Name PRODUCT VIDEONAME COST_TYPE Views IMPRESSIONS \
1 1.Blend Of Vdx Display DISPLAY TVC CPE 13373 778732
3 2.Mobile VDX Display Features CPE 150 1000
ENGAGEMENTS DPEENGAMENTS
1 13373 7142
3 150 160
#remove old index rows and append new from newdf
df = df.drop(newdf.index).append(newdf).sort_index()
print(df)
Placement#Name PRODUCT VIDEONAME COST_TYPE Views IMPRESSIONS \
0 1.Blend Of Vdx Display DISPLAY Features CPE 1255 778732
1 1.Blend Of Vdx Display DISPLAY TVC CPE 13373 778732
2 2.Mobile VDX Display Features CPE 168 1000
3 2.Mobile VDX Display Features CPE 150 1000
ENGAGEMENTS DPEENGAMENTS
0 13373 7142
1 13373 7142
2 150 160
3 150 160
另一种方法是:
这将筛选出并替换从视图
到预订
newdf['views'] = newdf.apply(lambda x: x['ENGAGEMENTS'] if ((x['PRODUCT'].upper()=='DISPLAY') & (x['COST_TYPE']=='CPE')) else x['views'], axis=1)
附加2个数据帧
df['views'].update(newdf['views'])
输出
Placement#Name PRODUCT VIDEONAME COST_TYPE views IMPRESSIONS \
0 1.Blend Of Vdx Display DISPLAY Features CPE 1255 778732
1 1.Blend Of Vdx Display DISPLAY TVC CPE 13373 778732
2 2.Mobile VDX Display Features CPE 168 1000
3 2.Mobile VDX Display Features CPE 150 1000
ENGAGEMENTS DPEENGAMENTS
0 13373 7142
1 13373 7142
2 150 160
3 150 160