Database 日期表重组

Database 日期表重组,database,pandas,dataframe,Database,Pandas,Dataframe,我需要做一个表的转换,我不知道从哪里开始。这是表格: | Customer Code | Activity | Start Date | |:---------------:|:--------:|:----------:| | 100 | A | 01/05/2017 | | 100 | A | 19/07/2017 | | 100 | B | 18/09/2017 | |

我需要做一个表的转换,我不知道从哪里开始。这是表格:

| Customer   Code | Activity | Start Date |
|:---------------:|:--------:|:----------:|
|       100       |     A    | 01/05/2017 |
|       100       |     A    | 19/07/2017 |
|       100       |     B    | 18/09/2017 |
|       100       |     C    | 07/12/2017 |
|       101       |     A    | 11/02/2018 |
|       101       |     B    | 02/04/2018 |
|       101       |     B    | 14/06/2018 |
|       100       |     A    | 13/07/2018 |
|       100       |     B    | 14/08/2018 |
客户可以始终按照该顺序执行活动A、B和C。要执行活动B,他/她必须执行活动A。要执行活动C,他/她必须先执行活动A,然后执行活动B。同一客户可以多次执行活动或周期

我需要以这种方式重新组织表格,放置每个步骤的开头和结尾:

| Customer   Code | Activity | Start Date |  End Date  |
|:---------------:|:--------:|:----------:|:----------:|
|       100       |     A    | 01/05/2017 | 18/09/2017 |
|       100       |     B    | 18/09/2017 | 07/12/2017 |
|       100       |     C    | 07/12/2017 | 13/07/2018 |
|       101       |     A    | 11/02/2018 | 02/04/2018 |
|       101       |     B    | 02/04/2018 |            |
|       100       |     A    | 13/07/2018 | 14/08/2018 |
|       100       |     B    | 14/08/2018 |            |
谢谢!:-)

IIUC,您可以使用:

df['Start Date'] = pd.to_datetime(df['Start Date'])
grp = (df['Customer Code'] != df['Customer Code'].shift()).cumsum().rename('grp')
df_out = df.groupby([grp,'Customer Code', 'Activity'])['Start Date'].min().reset_index()
df_out['End Date'] = df_out.groupby('Customer Code')['Start Date'].shift(-1)
df_out
输出:

   grp  Customer Code    Activity Start Date   End Date
0    1            100       A     2017-01-05 2017-09-18
1    1            100       B     2017-09-18 2017-07-12
2    1            100       C     2017-07-12 2018-07-13
3    2            101       A     2018-11-02 2018-02-04
4    2            101       B     2018-02-04        NaT
5    3            100       A     2018-07-13 2018-08-14
6    3            100       B     2018-08-14        NaT
   Customer Code    Activity Start Date  grp   End Date
0            100       A     2017-01-05    1 2017-09-18
2            100       B     2017-09-18    1 2017-07-12
3            100       C     2017-07-12    1 2018-07-13
4            101       A     2018-11-02    2 2018-02-04
5            101       B     2018-02-04    2        NaT
7            100       A     2018-07-13    3 2018-08-14
8            100       B     2018-08-14    3        NaT
细节: 首先根据客户代码的变化创建grp,将相同的客户代码分组在一起,在grp中找到每个活动的最小开始日期。接下来,按“客户代码”分组,并将下一个活动的开始日期上移到“结束日期”


使用
删除重复项的类似方法

df['grp'] = (df['Customer Code'] != df['Customer Code'].shift()).cumsum()
df = df.drop_duplicates(['grp','Customer Code', 'Activity']).copy()
df['End Date'] = df.groupby('Customer Code')['Start Date'].shift(-1)
df
输出:

   grp  Customer Code    Activity Start Date   End Date
0    1            100       A     2017-01-05 2017-09-18
1    1            100       B     2017-09-18 2017-07-12
2    1            100       C     2017-07-12 2018-07-13
3    2            101       A     2018-11-02 2018-02-04
4    2            101       B     2018-02-04        NaT
5    3            100       A     2018-07-13 2018-08-14
6    3            100       B     2018-08-14        NaT
   Customer Code    Activity Start Date  grp   End Date
0            100       A     2017-01-05    1 2017-09-18
2            100       B     2017-09-18    1 2017-07-12
3            100       C     2017-07-12    1 2018-07-13
4            101       A     2018-11-02    2 2018-02-04
5            101       B     2018-02-04    2        NaT
7            100       A     2018-07-13    3 2018-08-14
8            100       B     2018-08-14    3        NaT

你能解释一下101的2nd B在哪里吗?