Python 2.7 获取与列中最大值关联的行数据(Python/Pandas)
鉴于以下数据:Python 2.7 获取与列中最大值关联的行数据(Python/Pandas),python-2.7,pandas,Python 2.7,Pandas,鉴于以下数据: Sum amount_net amount_gross symbol Date_Time ts 7/29/2013 2:17 -68 755,101 -755,101 A 7/29/2013 2:17 7/29/2013 2:17 -21 251,945 -251,945 B 7/29/2013 2:17 7/29/2013 2:16 -1 2,200 -
Sum amount_net amount_gross symbol Date_Time
ts
7/29/2013 2:17 -68 755,101 -755,101 A 7/29/2013 2:17
7/29/2013 2:17 -21 251,945 -251,945 B 7/29/2013 2:17
7/29/2013 2:16 -1 2,200 -2,200 C 7/29/2013 2:16
7/29/2013 2:17 -5 11,000 -11,000 C 7/29/2013 2:17
7/29/2013 2:08 -1 5,384 -5,384 D 7/29/2013 2:08
7/29/2013 2:09 -3 16,151 -16,151 D 7/29/2013 2:09
7/29/2013 2:13 1 5,384 5,384 D 7/29/2013 2:13
7/29/2013 2:02 20 70,000 70,000 F 7/29/2013 2:02
7/29/2013 2:03 22 77,000 77,000 F 7/29/2013 2:03
7/29/2013 2:04 18 63,000 63,000 F 7/29/2013 2:04
7/29/2013 2:05 15 52,500 52,500 F 7/29/2013 2:05
7/29/2013 2:08 15 52,500 52,500 F 7/29/2013 2:08
7/29/2013 2:09 8 28,000 28,000 F 7/29/2013 2:09
7/29/2013 2:10 22 77,000 77,000 F 7/29/2013 2:10
7/29/2013 2:11 22 77,000 77,000 F 7/29/2013 2:11
7/29/2013 2:12 12 42,000 42,000 F 7/29/2013 2:12
7/29/2013 2:13 5 17,500 17,500 F 7/29/2013 2:13
7/29/2013 2:14 30 105,000 105,000 F 7/29/2013 2:14
7/29/2013 2:15 35 122,500 122,500 F 7/29/2013 2:15
7/29/2013 2:16 35 122,500 122,500 F 7/29/2013 2:16
我想返回每个符号的最大时间的总和、净额和总额。ie我想得到:
symbol Time Sum amount_net amount_gross
A 7/29/2013 2:17 -68 755,101 -755,101
B 7/29/2013 2:17 -21 251,945 -251,945
C 7/29/2013 2:17 -5 11,000 -11,000
D 7/29/2013 2:13 1 5,384 5,384
F 7/29/2013 2:16 35 122,500 122,500
简单地按符号和总和分组:
In [11]: df1.groupby('symbol').sum()
Out[11]:
Sum amount_net amount_gross
symbol
A -68 755101 -755101
B -21 251945 -251945
C -6 13200 -13200
D -3 26919 -16151
F 259 906500 906500
注意:atm看起来像amount\u net
和amount\u gross
没有被正确地解析为整数,而是字符串,但您可以使用以下方法进行转换:
df1[['amount_net', 'amount_gross']] = df1[['amount_net', 'amount_gross']].applymap(lambda x: int(x.replace(',', '')))
按时间顺序排序,按符号分组,然后从每组中选取最后一个(即“最长时间”)元素
In [28]: df.sort('Date_Time').groupby('symbol').last()
Out[28]:
Date_Time Sum amount_net amount_gross
symbol
A 2013-07-29 02:17:00 -68 755101 -755101
B 2013-07-29 02:17:00 -21 251945 -251945
C 2013-07-29 02:17:00 -5 11000 -11000
D 2013-07-29 02:13:00 1 5384 5384
F 2013-07-29 02:16:00 35 122500 122500
请参阅@Andy关于将数字解析为整数的评论。我认为OP不是在寻找总和,而是在寻找符号行的最大值(我认为
amount\u net
的最大值,但我还没有检查它是否合适。)说得好!我误解了OP是在寻找sum,而不是sum和其他一些列……对每个子帧进行排序几乎总是(?)更快:df.groupby('symbol')。apply(lambda x:x.sort().iloc[-1])
。尽管对框架进行排序可能是个好主意。