Python 基于求和列保留数据帧值

Python 基于求和列保留数据帧值,python,pandas,dataframe,filter,sum,Python,Pandas,Dataframe,Filter,Sum,这是我之前的一个问题的后续问题,我得到了帮助 这就是问题所在。假设有一个数据帧- dic = {'firstname':['John','John','John','John','John','Susan','Susan', 'Susan','Susan','Susan','Mike','Mike','Mike','Mike', 'Mike'], 'lastname':['Smith','Smith',

这是我之前的一个问题的后续问题,我得到了帮助

这就是问题所在。假设有一个数据帧-

dic = {'firstname':['John','John','John','John','John','Susan','Susan',
                    'Susan','Susan','Susan','Mike','Mike','Mike','Mike',
                    'Mike'],
       'lastname':['Smith','Smith','Smith','Smith','Smith','Wilson',
                   'Wilson','Wilson','Wilson','Wilson','Jones','Jones',
                   'Jones','Jones','Jones'],
       'company':['KFC','BK','KFC','KFC','KFC','BK','BK','WND','WND',
                  'WND','TB','CHP','TB','CHP','TB'],
       'paid':[200,300,250,100,900,650,430,218,946,789,305,750,140,860,310],
       'overtime':[205,554,840,100,203,640,978,451,356,779,650,950,230,250,980]}
df = pd.DataFrame(dic)
print(df)
有输出-

   firstname lastname company  paid  overtime
0       John    Smith     KFC   200       205
1       John    Smith      BK   300       554
2       John    Smith     KFC   250       840
3       John    Smith     KFC   100       100
4       John    Smith     KFC   900       203
5      Susan   Wilson      BK   650       640
6      Susan   Wilson      BK   430       978
7      Susan   Wilson     WND   218       451
8      Susan   Wilson     WND   946       356
9      Susan   Wilson     WND   789       779
10      Mike    Jones      TB   305       650
11      Mike    Jones     CHP   750       950
12      Mike    Jones      TB   140       230
13      Mike    Jones     CHP   860       250
14      Mike    Jones      TB   310       980
  lastname firstname company  paid
0   Wilson     Susan     WND  1953
1    Jones      Mike     CHP  1610
2    Smith      John     KFC  1450
最初,我想对paid列求和,只显示1300以上的值。 这就这样解决了-

df = df.groupby(['lastname', 'firstname','company'], as_index=False).agg({'paid':'sum'})
s = df['paid']>1300
df['limit']=s
df = df.loc[df['limit']==True]
del df['limit']
df = df.sort_values(by=['paid'],ascending=False).reset_index()
del df['index']
print(df)
有输出-

   firstname lastname company  paid  overtime
0       John    Smith     KFC   200       205
1       John    Smith      BK   300       554
2       John    Smith     KFC   250       840
3       John    Smith     KFC   100       100
4       John    Smith     KFC   900       203
5      Susan   Wilson      BK   650       640
6      Susan   Wilson      BK   430       978
7      Susan   Wilson     WND   218       451
8      Susan   Wilson     WND   946       356
9      Susan   Wilson     WND   789       779
10      Mike    Jones      TB   305       650
11      Mike    Jones     CHP   750       950
12      Mike    Jones      TB   140       230
13      Mike    Jones     CHP   860       250
14      Mike    Jones      TB   310       980
  lastname firstname company  paid
0   Wilson     Susan     WND  1953
1    Jones      Mike     CHP  1610
2    Smith      John     KFC  1450
我现在想做的是相对类似的,但我不再想求和这些值,我只想保留根据“paid”列求和到1300以上的行的原始信息

期望输出-

   firstname lastname company  paid  overtime
0       John    Smith     KFC   200       205
1       John    Smith     KFC   250       840
2       John    Smith     KFC   100       100
3       John    Smith     KFC   900       203
4      Susan   Wilson     WND   218       451
5      Susan   Wilson     WND   946       356
6      Susan   Wilson     WND   789       779
7       Mike    Jones     CHP   750       950
8       Mike    Jones     CHP   860       250

这是一个非常简单的换行。不使用agg,而是执行转换:

df = df.groupby(['lastname', 'firstname','company'], as_index=False).transform(sum)
And then,
df[df.groupby(['lastname', 'firstname','company'])['paid'].transform('sum') > 1350]

编辑:感谢DataNeighbor指出我应该完整地回答这个问题,并写下最后一行。

这是一个非常简单的一行更改。不使用agg,而是执行转换:

df = df.groupby(['lastname', 'firstname','company'], as_index=False).transform(sum)
And then,
df[df.groupby(['lastname', 'firstname','company'])['paid'].transform('sum') > 1350]

编辑:感谢DataNeighbor指出我应该做一个完整的答案,并写下最后一行。

如果你也添加了切片,它将是一个完整的答案
df[df.groupby(['lastname','firstname','company'])['paid'].transform('sum')>1350]
如果你也添加切片,它将是一个完整的答案
df[df.groupby(['lastname','firstname','company'])['paid'].transform('sum')>1350]