Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/sockets/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 获取与列中最大值关联的行数据(Python/Pandas)_Python 2.7_Pandas - Fatal编程技术网

Python 2.7 获取与列中最大值关联的行数据(Python/Pandas)

Python 2.7 获取与列中最大值关联的行数据(Python/Pandas),python-2.7,pandas,Python 2.7,Pandas,鉴于以下数据: Sum amount_net amount_gross symbol Date_Time ts 7/29/2013 2:17 -68 755,101 -755,101 A 7/29/2013 2:17 7/29/2013 2:17 -21 251,945 -251,945 B 7/29/2013 2:17 7/29/2013 2:16 -1 2,200 -

鉴于以下数据:

               Sum  amount_net  amount_gross    symbol  Date_Time
ts                  
7/29/2013 2:17  -68 755,101 -755,101        A   7/29/2013 2:17
7/29/2013 2:17  -21 251,945 -251,945        B   7/29/2013 2:17
7/29/2013 2:16  -1  2,200   -2,200          C   7/29/2013 2:16
7/29/2013 2:17  -5  11,000  -11,000         C   7/29/2013 2:17
7/29/2013 2:08  -1  5,384   -5,384          D   7/29/2013 2:08
7/29/2013 2:09  -3  16,151  -16,151         D   7/29/2013 2:09
7/29/2013 2:13  1   5,384   5,384           D   7/29/2013 2:13
7/29/2013 2:02  20  70,000  70,000          F   7/29/2013 2:02
7/29/2013 2:03  22  77,000  77,000          F   7/29/2013 2:03
7/29/2013 2:04  18  63,000  63,000          F   7/29/2013 2:04
7/29/2013 2:05  15  52,500  52,500          F   7/29/2013 2:05
7/29/2013 2:08  15  52,500  52,500          F   7/29/2013 2:08
7/29/2013 2:09  8   28,000  28,000          F   7/29/2013 2:09
7/29/2013 2:10  22  77,000  77,000          F   7/29/2013 2:10
7/29/2013 2:11  22  77,000  77,000          F   7/29/2013 2:11
7/29/2013 2:12  12  42,000  42,000          F   7/29/2013 2:12
7/29/2013 2:13  5   17,500  17,500          F   7/29/2013 2:13
7/29/2013 2:14  30  105,000 105,000         F   7/29/2013 2:14
7/29/2013 2:15  35  122,500 122,500         F   7/29/2013 2:15
7/29/2013 2:16  35  122,500 122,500         F   7/29/2013 2:16
我想返回每个符号的最大时间的总和、净额和总额。ie我想得到:

symbol  Time           Sum  amount_net  amount_gross
A   7/29/2013 2:17  -68 755,101        -755,101
B   7/29/2013 2:17  -21 251,945        -251,945
C   7/29/2013 2:17  -5  11,000          -11,000
D   7/29/2013 2:13  1   5,384             5,384
F   7/29/2013 2:16  35  122,500         122,500

简单地按符号和总和分组:

In [11]: df1.groupby('symbol').sum()
Out[11]:
        Sum  amount_net  amount_gross
symbol
A       -68      755101       -755101
B       -21      251945       -251945
C        -6       13200        -13200
D        -3       26919        -16151
F       259      906500        906500
注意:atm看起来像
amount\u net
amount\u gross
没有被正确地解析为整数,而是字符串,但您可以使用以下方法进行转换:

df1[['amount_net', 'amount_gross']] = df1[['amount_net', 'amount_gross']].applymap(lambda x: int(x.replace(',', '')))

按时间顺序排序,按符号分组,然后从每组中选取最后一个(即“最长时间”)元素

In [28]: df.sort('Date_Time').groupby('symbol').last()
Out[28]: 
                 Date_Time  Sum  amount_net  amount_gross
symbol                                                   
A      2013-07-29 02:17:00  -68      755101       -755101
B      2013-07-29 02:17:00  -21      251945       -251945
C      2013-07-29 02:17:00   -5       11000        -11000
D      2013-07-29 02:13:00    1        5384          5384
F      2013-07-29 02:16:00   35      122500        122500

请参阅@Andy关于将数字解析为整数的评论。

我认为OP不是在寻找总和,而是在寻找符号行的最大值(我认为
amount\u net
的最大值,但我还没有检查它是否合适。)说得好!我误解了OP是在寻找sum,而不是sum和其他一些列……对每个子帧进行排序几乎总是(?)更快:
df.groupby('symbol')。apply(lambda x:x.sort().iloc[-1])
。尽管对框架进行排序可能是个好主意。