Python 在给定的时间段内按天获取计数_Python_Python 3.x_Pandas

Python 在给定的时间段内按天获取计数

python python-3.x pandas

Python 在给定的时间段内按天获取计数,python,python-3.x,pandas,Python,Python 3.x,Pandas,示例数据帧： id start1 end1 start2 end2 0 Bob 2018-11-29 2018-11-30 2018-12-01 2018-12-31 1 James 2018-10-19 2018-10-31 NaT NaT 2 Jane 2018-04-05 2018-07-12 2018-11-29 2018-11-30 鉴于

示例数据帧：

    id           start1        end1      start2        end2
0  Bob       2018-11-29  2018-11-30  2018-12-01  2018-12-31
1  James     2018-10-19  2018-10-31         NaT         NaT
2  Jane      2018-04-05  2018-07-12  2018-11-29  2018-11-30

鉴于上面的示例数据框架，我想按月份和年份显示频率计数。让我们假设在这段时间里，每个人（

id

）都受到某种东西的“影响”。每个人最多有两个时间段（始终至少有一个时间段（即

start1

和

end1

），但可能有也可能没有第二个时间段（即

start2

和

end2

）。我想展示在所有人受到影响的整个时间范围内，有多少人受到了月份和年份的影响

例如，上面的数据会导致类似的结果（不确定年-月是同一列还是多个，无论什么情况都可以）：

我的最终目标是在不同的时间段（例如，年（本示例数据中均为2018年）、月/年、周等）查看这些数据

我不知道如何将它们解压成一个系列，这样我就可以在单个列上绘制直方图。我知道一旦我把它们放在一列中（例如，

date

），我就可以做如下事情：

df.groupby(df["date"].dt.month).count().plot(kind="bar")

但这只能按月计算，并且假设我已经在一列中列出了日期

我可以使用

datetime

并在循环中不断添加天数，如果是在每个时间帧之间，直到到达结束日期，但每次我这样做时，我都会发现pandas/numpy有更好的方法我正在寻找更好的方法。

您可以先使用pd.wide\u重塑数据帧

from pandas.tseries.offsets import MonthEnd

newdf=pd.wide_to_long(df,['start','end'],i='id',j='drop')
newdf=newdf.apply(pd.to_datetime)
newdf=newdf.dropna()
newdf.start=newdf.start.values.astype('datetime64[M]')
newdf.end=newdf.end+MonthEnd(0)
newdf
                start        end
id    drop                      
Bob   1    2018-11-01 2018-11-30
James 1    2018-10-01 2018-10-31
Jane  1    2018-04-01 2018-07-31
Bob   2    2018-12-01 2018-12-31
Jane  2    2018-11-01 2018-11-30

然后我们使用

date\u范围

l=[pd.date_range(x,y,freq='M',closed ='right').strftime('%Y-%m') for x ,y in zip(newdf.start,newdf.end)]
pd.Series(np.concatenate(l)).value_counts()
2018-11    2
2018-05    1
2018-12    1
2018-04    1
2018-06    1
2018-10    1
2018-07    1
dtype: int64

l=[pd.date_range(x,y,freq='M',closed ='right').strftime('%Y-%m') for x ,y in zip(newdf.start,newdf.end)]
pd.Series(np.concatenate(l)).value_counts()
2018-11    2
2018-05    1
2018-12    1
2018-04    1
2018-06    1
2018-10    1
2018-07    1
dtype: int64