Python 如何对数据中的日期进行智能索引，而数据中的日期缺失_Python_Pandas_Date_Indexing

Python 如何对数据中的日期进行智能索引，而数据中的日期缺失

python pandas date indexing

Python 如何对数据中的日期进行智能索引，而数据中的日期缺失,python,pandas,date,indexing,Python,Pandas,Date,Indexing,我有这样一个数据框： id date value 1 2017-01-01 10 1 2017-01-01 20 1 2017-01-02 10 1 2017-01-02 15 1 2017-01-07 25 2 2017-05-01 10 2 2017-05-01 15 2 2017-05-20 30 3 2010-08-08 40 3

我有这样一个数据框：

id      date       value
1       2017-01-01  10
1       2017-01-01  20
1       2017-01-02  10
1       2017-01-02  15
1       2017-01-07  25
2       2017-05-01  10
2       2017-05-01  15
2       2017-05-20  30
3       2010-08-08  40
3       2010-08-11  20
3       2010-08-11  43

id       date        value    index
1        2017-01-01  30       1
1        2017-01-02  25       2
1        2017-01-07  25       3   
2        2017-05-01  25       1
2        2017-05-20  30       2
3        2010-08-08  40       1
3        2010-08-11  63       2

我想为每个日期添加值，并添加与日期相关的索引列，例如，最后的数据应如下所示：

id      date       value
1       2017-01-01  10
1       2017-01-01  20
1       2017-01-02  10
1       2017-01-02  15
1       2017-01-07  25
2       2017-05-01  10
2       2017-05-01  15
2       2017-05-20  30
3       2010-08-08  40
3       2010-08-11  20
3       2010-08-11  43

id       date        value    index
1        2017-01-01  30       1
1        2017-01-02  25       2
1        2017-01-07  25       3   
2        2017-05-01  25       1
2        2017-05-20  30       2
3        2010-08-08  40       1
3        2010-08-11  63       2

熊猫是你的朋友

>>> df
    id       date  value
0    1 2017-01-01     10
1    1 2017-01-01     20
2    1 2017-01-02     10
3    1 2017-01-02     15
4    1 2017-01-07     25
5    2 2017-05-01     10
6    2 2017-05-01     15
7    2 2017-05-20     30
8    3 2010-08-08     40
9    3 2010-08-11     20
10   3 2010-08-11     43

按日期和id对数据进行分组，这样就不会使用

.sum（）

对数据进行求和

由于_index=False，因此日期列不会成为索引sort=False
使其不按日期排序
>>> g = df.groupby(['date', 'id'], as_index=False, sort=False).sum()
>>> g
      date  id  value
2 2017-01-01   1     30
3 2017-01-02   1     25
4 2017-01-07   1     25
5 2017-05-01   2     25
6 2017-05-20   2     30
0 2010-08-08   3     40
1 2010-08-11   3     63

第二部分的意思有点模糊，但假设它意味着相等ID的累积和：
>>> g['index'] = g.assign(count=1).groupby('id').cumsum()['count']
>>> g
        date  id  value  index
2 2017-01-01   1     30      1
3 2017-01-02   1     25      2
4 2017-01-07   1     25      3
5 2017-05-01   2     25      1
6 2017-05-20   2     30      2
0 2010-08-08   3     40      1
1 2010-08-11   3     63      2

在这里，我们将g['index']
分配给count
列的累积和，我们给每个元素一个等于1的数据帧
如果您实际指的是每个类似月份的累计总和，那么可以通过按df.date.dt.month
分组并应用类似的方法来实现。
熊猫是您的朋友
>>> df
    id       date  value
0    1 2017-01-01     10
1    1 2017-01-01     20
2    1 2017-01-02     10
3    1 2017-01-02     15
4    1 2017-01-07     25
5    2 2017-05-01     10
6    2 2017-05-01     15
7    2 2017-05-20     30
8    3 2010-08-08     40
9    3 2010-08-11     20
10   3 2010-08-11     43

按日期和id对数据进行分组，这样就不会使用.sum（）
对数据进行求和由于_index=False，因此日期列不会成为索引sort=False
使其不按日期排序
>>> g = df.groupby(['date', 'id'], as_index=False, sort=False).sum()
>>> g
      date  id  value
2 2017-01-01   1     30
3 2017-01-02   1     25
4 2017-01-07   1     25
5 2017-05-01   2     25
6 2017-05-20   2     30
0 2010-08-08   3     40
1 2010-08-11   3     63

第二部分的意思有点模糊，但假设它意味着相等ID的累积和：
>>> g['index'] = g.assign(count=1).groupby('id').cumsum()['count']
>>> g
        date  id  value  index
2 2017-01-01   1     30      1
3 2017-01-02   1     25      2
4 2017-01-07   1     25      3
5 2017-05-01   2     25      1
6 2017-05-20   2     30      2
0 2010-08-08   3     40      1
1 2010-08-11   3     63      2

在这里，我们将g['index']
分配给count
列的累积和，我们给每个元素一个等于1的数据帧
如果您实际指的是每个类似月份的累计总和，则可以通过按df.date.dt.month
分组并应用类似的方法来实现。
sum
和cumcount

df1=df.groupby(['id','date'],as_index=False).value.sum()
df1['index']=df1.groupby('id',as_index=False).cumcount().add(1)
df1
Out[167]: 
   id        date  value  index
0   1  2017-01-01     30      1
1   1  2017-01-02     25      2
2   1  2017-01-07     25      3
3   2  2017-05-01     25      1
4   2  2017-05-20     30      2
5   3  2010-08-08     40      1
6   3  2010-08-11     63      2

sum
和cumcount

df1=df.groupby(['id','date'],as_index=False).value.sum()
df1['index']=df1.groupby('id',as_index=False).cumcount().add(1)
df1
Out[167]: 
   id        date  value  index
0   1  2017-01-01     30      1
1   1  2017-01-02     25      2
2   1  2017-01-07     25      3
3   2  2017-05-01     25      1
4   2  2017-05-20     30      2
5   3  2010-08-08     40      1
6   3  2010-08-11     63      2

添加与日期相关的索引列
请详细解释。添加与日期相关的索引列
请详细解释。