Python 熊猫：在给定的开始日期和结束日期之间计算每个月的天数_Python_Pandas_Datetime

Python 熊猫：在给定的开始日期和结束日期之间计算每个月的天数

python pandas datetime

Python 熊猫：在给定的开始日期和结束日期之间计算每个月的天数,python,pandas,datetime,Python,Pandas,Datetime,我有一个熊猫数据框，有一些开始和结束日期 ActualStartDate ActualEndDate 0 2019-06-30 2019-08-15 1 2019-09-01 2020-01-01 2 2019-08-28 2019-11-13 给定这些开始和结束日期，我需要计算开始和结束日期之间每个月的天数。我想不出一个很好的方法来实现这一点，但结果数据帧应该是这样的： ActualStartDate ActualEndDate 2019-06 2019-07 2019-0

我有一个熊猫数据框，有一些开始和结束日期

ActualStartDate ActualEndDate
0   2019-06-30  2019-08-15
1   2019-09-01  2020-01-01
2   2019-08-28  2019-11-13

给定这些开始和结束日期，我需要计算开始和结束日期之间每个月的天数。我想不出一个很好的方法来实现这一点，但结果数据帧应该是这样的：

ActualStartDate ActualEndDate 2019-06 2019-07 2019-08 2019-09 2019-10 2019-11 2019-12 2020-01 etc
0   2019-06-30  2019-08-15    1       31      15      0       0       0       0       0
1   2019-09-01  2020-01-01    0       0       0       30      31      30      31      1
2   2019-08-28  2019-11-13    0       0       4       30      31      13      0       0

请注意，实际数据帧有约1500行，具有不同的开始和结束日期。打开不同的df输出，但显示上述内容，让您了解我需要完成的任务。提前感谢您的帮助

Idea是创建月份周期的依据和计数依据，然后创建

DataFrame

by，用替换缺少的值的依据，最后连接到原始依据：

性能：

df = pd.concat([df] * 1000, ignore_index=True)

In [44]: %%timeit
    ...: L = {r.Index: pd.date_range(r.ActualStartDate, r.ActualEndDate).to_period('M').value_counts()
    ...:      for r in df.itertuples()}
    ...: df.join(pd.concat(L, axis=1).fillna(0).astype(int).T)
    ...: 
689 ms ± 5.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [45]: %%timeit
    ...: df.join(
    ...:     df.apply(lambda v: pd.Series(pd.date_range(v['ActualStartDate'], v['ActualEndDate'], freq='D').to_period('M')), axis=1)
    ...:     .apply(pd.value_counts, axis=1)
    ...:     .fillna(0)
    ...:     .astype(int))
    ...:     
994 ms ± 5.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

其思想是创建月周期的依据是from和count by，然后创建

DataFrame

by，替换缺少的值的依据，最后连接到原始依据：

性能：

df = pd.concat([df] * 1000, ignore_index=True)

In [44]: %%timeit
    ...: L = {r.Index: pd.date_range(r.ActualStartDate, r.ActualEndDate).to_period('M').value_counts()
    ...:      for r in df.itertuples()}
    ...: df.join(pd.concat(L, axis=1).fillna(0).astype(int).T)
    ...: 
689 ms ± 5.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [45]: %%timeit
    ...: df.join(
    ...:     df.apply(lambda v: pd.Series(pd.date_range(v['ActualStartDate'], v['ActualEndDate'], freq='D').to_period('M')), axis=1)
    ...:     .apply(pd.value_counts, axis=1)
    ...:     .fillna(0)
    ...:     .astype(int))
    ...:     
994 ms ± 5.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

可能不是最有效的，但对于约1500行来说应该不会太差。。。扩展一个日期范围，然后将其转换为一个月周期，计算这些时间段的计数，然后重新加入到原始DF，例如：

res = df.join(
    df.apply(lambda v: pd.Series(pd.date_range(v['ActualStartDate'], v['ActualEndDate'], freq='D').to_period('M')), axis=1)
    .apply(pd.value_counts, axis=1)
    .fillna(0)
    .astype(int)
)

给你：

  ActualStartDate ActualEndDate  2019-06  2019-07  2019-08  2019-09  2019-10  2019-11  2019-12  2020-01  2020-02  2020-03  2020-04  2020-05  2020-06  2020-07  2020-08  2020-09  2020-10  2020-11
0      2019-06-30    2020-08-15        1       31       31       30       31       30       31       31       29       31       30       31       30       31       15        0        0        0
1      2019-09-01    2020-01-01        0        0        0       30       31       30       31        1        0        0        0        0        0        0        0        0        0        0
2      2019-08-28    2020-11-13        0        0        4       30       31       30       31       31       29       31       30       31       30       31       31       30       31       13

res = df.join(
    df.apply(lambda v: pd.Series(pd.date_range(v['ActualStartDate'], v['ActualEndDate'], freq='D').to_period('M')), axis=1)
    .apply(pd.value_counts, axis=1)
    .fillna(0)
    .astype(int)
)

给你：

  ActualStartDate ActualEndDate  2019-06  2019-07  2019-08  2019-09  2019-10  2019-11  2019-12  2020-01  2020-02  2020-03  2020-04  2020-05  2020-06  2020-07  2020-08  2020-09  2020-10  2020-11
0      2019-06-30    2020-08-15        1       31       31       30       31       30       31       31       29       31       30       31       30       31       15        0        0        0
1      2019-09-01    2020-01-01        0        0        0       30       31       30       31        1        0        0        0        0        0        0        0        0        0        0
2      2019-08-28    2020-11-13        0        0        4       30       31       30       31       31       29       31       30       31       30       31       31       30       31       13

请参阅此链接：，同样尝试查找两个给定日期之间每月的天数。请参阅此链接：，同样尝试查找两个给定日期之间每月的天数。