Python 基于求和列保留数据帧值
这是我之前的一个问题的后续问题,我得到了帮助 这就是问题所在。假设有一个数据帧-Python 基于求和列保留数据帧值,python,pandas,dataframe,filter,sum,Python,Pandas,Dataframe,Filter,Sum,这是我之前的一个问题的后续问题,我得到了帮助 这就是问题所在。假设有一个数据帧- dic = {'firstname':['John','John','John','John','John','Susan','Susan', 'Susan','Susan','Susan','Mike','Mike','Mike','Mike', 'Mike'], 'lastname':['Smith','Smith',
dic = {'firstname':['John','John','John','John','John','Susan','Susan',
'Susan','Susan','Susan','Mike','Mike','Mike','Mike',
'Mike'],
'lastname':['Smith','Smith','Smith','Smith','Smith','Wilson',
'Wilson','Wilson','Wilson','Wilson','Jones','Jones',
'Jones','Jones','Jones'],
'company':['KFC','BK','KFC','KFC','KFC','BK','BK','WND','WND',
'WND','TB','CHP','TB','CHP','TB'],
'paid':[200,300,250,100,900,650,430,218,946,789,305,750,140,860,310],
'overtime':[205,554,840,100,203,640,978,451,356,779,650,950,230,250,980]}
df = pd.DataFrame(dic)
print(df)
有输出-
firstname lastname company paid overtime
0 John Smith KFC 200 205
1 John Smith BK 300 554
2 John Smith KFC 250 840
3 John Smith KFC 100 100
4 John Smith KFC 900 203
5 Susan Wilson BK 650 640
6 Susan Wilson BK 430 978
7 Susan Wilson WND 218 451
8 Susan Wilson WND 946 356
9 Susan Wilson WND 789 779
10 Mike Jones TB 305 650
11 Mike Jones CHP 750 950
12 Mike Jones TB 140 230
13 Mike Jones CHP 860 250
14 Mike Jones TB 310 980
lastname firstname company paid
0 Wilson Susan WND 1953
1 Jones Mike CHP 1610
2 Smith John KFC 1450
最初,我想对paid列求和,只显示1300以上的值。
这就这样解决了-
df = df.groupby(['lastname', 'firstname','company'], as_index=False).agg({'paid':'sum'})
s = df['paid']>1300
df['limit']=s
df = df.loc[df['limit']==True]
del df['limit']
df = df.sort_values(by=['paid'],ascending=False).reset_index()
del df['index']
print(df)
有输出-
firstname lastname company paid overtime
0 John Smith KFC 200 205
1 John Smith BK 300 554
2 John Smith KFC 250 840
3 John Smith KFC 100 100
4 John Smith KFC 900 203
5 Susan Wilson BK 650 640
6 Susan Wilson BK 430 978
7 Susan Wilson WND 218 451
8 Susan Wilson WND 946 356
9 Susan Wilson WND 789 779
10 Mike Jones TB 305 650
11 Mike Jones CHP 750 950
12 Mike Jones TB 140 230
13 Mike Jones CHP 860 250
14 Mike Jones TB 310 980
lastname firstname company paid
0 Wilson Susan WND 1953
1 Jones Mike CHP 1610
2 Smith John KFC 1450
我现在想做的是相对类似的,但我不再想求和这些值,我只想保留根据“paid”列求和到1300以上的行的原始信息
期望输出-
firstname lastname company paid overtime
0 John Smith KFC 200 205
1 John Smith KFC 250 840
2 John Smith KFC 100 100
3 John Smith KFC 900 203
4 Susan Wilson WND 218 451
5 Susan Wilson WND 946 356
6 Susan Wilson WND 789 779
7 Mike Jones CHP 750 950
8 Mike Jones CHP 860 250
这是一个非常简单的换行。不使用agg,而是执行转换:
df = df.groupby(['lastname', 'firstname','company'], as_index=False).transform(sum)
And then,
df[df.groupby(['lastname', 'firstname','company'])['paid'].transform('sum') > 1350]
编辑:感谢DataNeighbor指出我应该完整地回答这个问题,并写下最后一行。这是一个非常简单的一行更改。不使用agg,而是执行转换:
df = df.groupby(['lastname', 'firstname','company'], as_index=False).transform(sum)
And then,
df[df.groupby(['lastname', 'firstname','company'])['paid'].transform('sum') > 1350]
编辑:感谢DataNeighbor指出我应该做一个完整的答案,并写下最后一行。如果你也添加了切片,它将是一个完整的答案
df[df.groupby(['lastname','firstname','company'])['paid'].transform('sum')>1350]
如果你也添加切片,它将是一个完整的答案df[df.groupby(['lastname','firstname','company'])['paid'].transform('sum')>1350]