Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/332.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫选择包含最近观察到的配对的行_Python_Pandas - Fatal编程技术网

Python 熊猫选择包含最近观察到的配对的行

Python 熊猫选择包含最近观察到的配对的行,python,pandas,Python,Pandas,问题是: 我只希望在uid和零售商的每个配对中选择最新的价格记录 数据: import pandas as pd import numpy as np data = {"uid":{"0":"123","1":"345","2":"678","3":"123","4":"345","5":"123","6":"678","7":"369","8":"890","9":"678"},"retailer":{"0":"GUY","1":"GUY","2":"GUY","3":"GUY","4":"G

问题是:

我只希望在uid和零售商的每个配对中选择最新的价格记录

数据:

import pandas as pd
import numpy as np
data = {"uid":{"0":"123","1":"345","2":"678","3":"123","4":"345","5":"123","6":"678","7":"369","8":"890","9":"678"},"retailer":{"0":"GUY","1":"GUY","2":"GUY","3":"GUY","4":"GUY","5":"GAL","6":"GUY","7":"GAL","8":"GAL","9":"GUY"},"upload date":{"0":"11/17/17","1":"11/17/17","2":"11/16/17","3":"11/16/17","4":"11/16/17","5":"11/17/17","6":"11/17/17","7":"11/17/17","8":"11/17/17","9":"11/15/17"},"price":{"0":12.00,"1":1.23, "2":34.00, "3":69.69, "4":13.72, "5":49.98, "6":98.02, "7":1.02,"8":98.23,"9":12.69}}
df = pd.DataFrame(data=data)
df = df[['uid','retailer','upload date','price']]
df['upload date']=pd.to_datetime(df['upload date'])
解决方案:

idx = df.groupby(['uid','retailer'])['upload date'].max().rename('upload date')
idx.reset_index(inplace=True)
solution = idx.merge(df, how='left', on=['uid','retailer','upload date'])
问题:

我希望能够利用索引来获得我的解决方案。或者,我希望能够使用join,或者使用保留原始数据帧索引的函数查找每个配对的最大日期

连接错误:

idx.set_index(['uid','retailer','upload date']).join(df, on=['uid','retailer','upload date'])
返回:

ValueError: len(left_on) must equal the number of levels in the index of "right"

IIUC,
idxmax

df.loc[df.groupby(['uid','retailer'])['upload date'].idxmax()]
Out[168]: 
   uid retailer upload date  price
5  123      GAL  2017-11-17  49.98
0  123      GUY  2017-11-17  12.00
1  345      GUY  2017-11-17   1.23
7  369      GAL  2017-11-17   1.02
6  678      GUY  2017-11-17  98.02
8  890      GAL  2017-11-17  98.23
reindex

df.reindex(df.groupby(['uid','retailer'])['upload date'].idxmax().values)

如果您想
join
文档说明:将列与索引或键列上的其他数据帧连接起来

要获得预期的输出,您需要在末尾添加
.reset\u index()

或者做类似的事情

idx.join(df.set_index(['uid','retailer','upload date']),on=['uid','retailer','upload date'])
Out[177]: 
   uid retailer upload date  price
0  123      GAL  2017-11-17  49.98
1  123      GUY  2017-11-17  12.00
2  345      GUY  2017-11-17   1.23
3  369      GAL  2017-11-17   1.02
4  678      GUY  2017-11-17  98.02
5  890      GAL  2017-11-17  98.23

idx max返回返回值的索引?我来玩玩。你能给我看看我是怎么使用join的吗?我不明白这为什么不起作用?@YaleNewman你可以通过使用索引或列来加入它们,添加加入的解决方案,但我更喜欢
idxmax
,因为这是直截了当的啊好的,因为它们在两个数据帧中都是唯一的,但如果我们在16日有两个观察,你就不能将索引设置为uid、零售商、上传日期
idx.join(df.set_index(['uid','retailer','upload date']),on=['uid','retailer','upload date'])
Out[177]: 
   uid retailer upload date  price
0  123      GAL  2017-11-17  49.98
1  123      GUY  2017-11-17  12.00
2  345      GUY  2017-11-17   1.23
3  369      GAL  2017-11-17   1.02
4  678      GUY  2017-11-17  98.02
5  890      GAL  2017-11-17  98.23