Python 熊猫使用groupby进行上采样和重采样_Python_Pandas_Pandas Groupby

Python 熊猫使用groupby进行上采样和重采样

python pandas

Python 熊猫使用groupby进行上采样和重采样,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我已将时间序列分组为有间隙的时间序列。我不想填补空白，尊重群体 date在每个id中是唯一的下面的工作，但给我零的地方，我不想南的 data.groupby('id').resample('D', on='date').sum()\ .drop('id', axis=1).reset_index() 由于某些原因，以下方法不起作用 data.groupby('id').resample('D', on='date').asfreq()\ .drop('id', axis=1)

我已将时间序列分组为有间隙的时间序列。我不想填补空白，尊重群体

date

在每个

id

中是唯一的

下面的工作，但给我零的地方，我不想南的

data.groupby('id').resample('D', on='date').sum()\
    .drop('id', axis=1).reset_index()

由于某些原因，以下方法不起作用

data.groupby('id').resample('D', on='date').asfreq()\
    .drop('id', axis=1).reset_index()

data.groupby('id').resample('D', on='date').fillna('pad')\
    .drop('id', axis=1).reset_index()

我得到以下错误：

不支持从level=或on=选择向上采样，请使用.set_index（…）将索引显式设置为datetime，如

我曾尝试将

pandas.gropper

与

set\u index

multivel index或single一起使用，但它似乎没有对我的日期列进行上采样，因此我得到了连续的日期，或者它不尊重

id

列

熊猫是版本0.23

自己试试看：

data = pd.DataFrame({
'id': [1,1,1,2,2,2],
'date': [
    datetime(2018, 1, 1),
    datetime(2018, 1, 5),
    datetime(2018, 1, 10),
    datetime(2018, 1, 1),
    datetime(2018, 1, 5),
    datetime(2018, 1, 10)],
'value': [100, 110, 90, 50, 40, 60]})

# Works but gives zeros
data.groupby('id').resample('D', on='date').sum()
# Fails
data.groupby('id').resample('D', on='date').asfreq()
data.groupby('id').resample('D', on='date').fillna('pad')

创建

DatetimeIndex

并从

重采样中删除上的参数：
print (data.set_index('date').groupby('id').resample('D').asfreq())
                id
id date           
1  2018-01-01  1.0
   2018-01-02  NaN
   2018-01-03  NaN
   2018-01-04  NaN
   2018-01-05  1.0
   2018-01-06  NaN
   2018-01-07  NaN
   2018-01-08  NaN
   2018-01-09  NaN
   2018-01-10  1.0
2  2018-01-01  2.0
   2018-01-02  NaN
   2018-01-03  NaN
   2018-01-04  NaN
   2018-01-05  2.0
   2018-01-06  NaN
   2018-01-07  NaN
   2018-01-08  NaN
   2018-01-09  NaN
   2018-01-10  2.0


编辑：
如果要使用缺少值的sum
，则需要min\u count=1
参数-：
最小计数：int，默认值为0
执行该操作所需的有效值数。如果存在小于最小计数的非NA值，则结果将为NA
版本0.22.0中新增：添加默认值为0。这意味着全NA或空系列的总和为0，全NA或空系列的乘积为1
由于日期不唯一，因此不起作用。我遇到以下错误无法使用方法或限制重新索引非唯一索引
。每个id代表一个时间序列，它们可以重叠，例如，假设您有多个气象站，您不想用NaN填充缺失的日期，以便获得连续的每日时间序列。@CodeMonkey-因此需要预处理数据-首先聚合sum
或mean以获得唯一的日期时间，然后应用此解决方案。给我一些时间做数据样本。我已经为您添加了样本数据。@CodeMonkey-我没有发现错误，您现在可以检查一下吗？在熊猫0.24.0中测试谢谢！在运行整个过程中，我发现自己的一个断言失败了。我无意中删除了一些代码，因此每个ID中的日期不唯一。我的部分失败。
print (data.set_index('date').groupby('id').resample('D').fillna('pad'))
#alternatives
#print (data.set_index('date').groupby('id').resample('D').ffill())
#print (data.set_index('date').groupby('id').resample('D').pad())
               id
id date          
1  2018-01-01   1
   2018-01-02   1
   2018-01-03   1
   2018-01-04   1
   2018-01-05   1
   2018-01-06   1
   2018-01-07   1
   2018-01-08   1
   2018-01-09   1
   2018-01-10   1
2  2018-01-01   2
   2018-01-02   2
   2018-01-03   2
   2018-01-04   2
   2018-01-05   2
   2018-01-06   2
   2018-01-07   2
   2018-01-08   2
   2018-01-09   2
   2018-01-10   2

print (data.groupby('id').resample('D', on='date').sum(min_count=1))