Python 数据帧中的日期间隔

Python 数据帧中的日期间隔,python,pandas,dataframe,Python,Pandas,Dataframe,我在数据框中有一列日期,它们表示为字符串。我想创建一个列,将日期表示为间隔。我的数据是这样的 df1=DataFrame({'x1':[1,5,2,6,3,7,4,7,9,10,5,3,2,7,3,8,4,3,7,2,5,5,2,2],'date':['2014-01-01','2014-01-01','2014-01-01','2014-01-01','2014-01-02','2014-01-02','2014-01-03','2014-01-04','2014-01-05','2014-0

我在数据框中有一列日期,它们表示为字符串。我想创建一个列,将日期表示为间隔。我的数据是这样的

df1=DataFrame({'x1':[1,5,2,6,3,7,4,7,9,10,5,3,2,7,3,8,4,3,7,2,5,5,2,2],'date':['2014-01-01','2014-01-01','2014-01-01','2014-01-01','2014-01-02','2014-01-02','2014-01-03','2014-01-04','2014-01-05','2014-01-05','2014-01-05','2014-01-05','2014-01-06','2014-01-07','2014-01-07','2014-01-08','2014-01-09','2014-01-10','2014-01-10','2014-01-10','2014-01-10','2014-01-11','2014-01-12','2014-01-12']})


          date  x1
0   2014-01-01   1
1   2014-01-01   5
2   2014-01-01   2
3   2014-01-01   6
4   2014-01-02   3
5   2014-01-02   7
6   2014-01-03   4
7   2014-01-04   7
8   2014-01-05   9
9   2014-01-05  10
10  2014-01-05   5
11  2014-01-05   3
12  2014-01-06   2
13  2014-01-07   7
14  2014-01-07   3
15  2014-01-08   8
16  2014-01-09   4
17  2014-01-10   3
18  2014-01-10   7
19  2014-01-10   2
20  2014-01-10   5
21  2014-01-11   5
22  2014-01-12   2
23  2014-01-12   2
 df['level']=['a','a','a','a','a','a','a','a','b','b','b','b','b','b','b','b','c','c','c','c','c','c','c','c']

          date  x1 level
 0   2014-01-01   1     a
 1   2014-01-01   5     a
 2   2014-01-01   2     a
 3   2014-01-01   6     a
 4   2014-01-02   3     a
 5   2014-01-02   7     a
 6   2014-01-03   4     a
 7   2014-01-04   7     a
 8   2014-01-05   9     b
 9   2014-01-05  10     b
 10  2014-01-05   5     b
 11  2014-01-05   3     b
 12  2014-01-06   2     b
 13  2014-01-07   7     b
 14  2014-01-07   3     b
 15  2014-01-08   8     b
 16  2014-01-09   4     c
 17  2014-01-10   3     c
 18  2014-01-10   7     c
 19  2014-01-10   2     c
 20  2014-01-10   5     c
 21  2014-01-11   5     c
 22  2014-01-12   2     c
 23  2014-01-12   2     c
但我希望它看起来像这样

df1=DataFrame({'x1':[1,5,2,6,3,7,4,7,9,10,5,3,2,7,3,8,4,3,7,2,5,5,2,2],'date':['2014-01-01','2014-01-01','2014-01-01','2014-01-01','2014-01-02','2014-01-02','2014-01-03','2014-01-04','2014-01-05','2014-01-05','2014-01-05','2014-01-05','2014-01-06','2014-01-07','2014-01-07','2014-01-08','2014-01-09','2014-01-10','2014-01-10','2014-01-10','2014-01-10','2014-01-11','2014-01-12','2014-01-12']})


          date  x1
0   2014-01-01   1
1   2014-01-01   5
2   2014-01-01   2
3   2014-01-01   6
4   2014-01-02   3
5   2014-01-02   7
6   2014-01-03   4
7   2014-01-04   7
8   2014-01-05   9
9   2014-01-05  10
10  2014-01-05   5
11  2014-01-05   3
12  2014-01-06   2
13  2014-01-07   7
14  2014-01-07   3
15  2014-01-08   8
16  2014-01-09   4
17  2014-01-10   3
18  2014-01-10   7
19  2014-01-10   2
20  2014-01-10   5
21  2014-01-11   5
22  2014-01-12   2
23  2014-01-12   2
 df['level']=['a','a','a','a','a','a','a','a','b','b','b','b','b','b','b','b','c','c','c','c','c','c','c','c']

          date  x1 level
 0   2014-01-01   1     a
 1   2014-01-01   5     a
 2   2014-01-01   2     a
 3   2014-01-01   6     a
 4   2014-01-02   3     a
 5   2014-01-02   7     a
 6   2014-01-03   4     a
 7   2014-01-04   7     a
 8   2014-01-05   9     b
 9   2014-01-05  10     b
 10  2014-01-05   5     b
 11  2014-01-05   3     b
 12  2014-01-06   2     b
 13  2014-01-07   7     b
 14  2014-01-07   3     b
 15  2014-01-08   8     b
 16  2014-01-09   4     c
 17  2014-01-10   3     c
 18  2014-01-10   7     c
 19  2014-01-10   2     c
 20  2014-01-10   5     c
 21  2014-01-11   5     c
 22  2014-01-12   2     c
 23  2014-01-12   2     c

其中a代表时间间隔['2014-01-01','2014-01-04',b代表['2014-01-05',2014-01-08',c代表['2014-01-09','2014-01-12']

一种方法是定义级别掩码并设置级别列值,为了便于比较,我已将“日期”列转换为日期时间数据类型:

In [61]:
df1['date'] = pd.to_datetime(df1['date'])
a_mask = (df1['date']>='2014-01-01') & (df1['date']<='2014-01-04')
b_mask = (df1['date']>='2014-01-05') & (df1['date']<='2014-01-08')
c_mask = (df1['date']>='2014-01-09') & (df1['date']<='2014-01-12')
df1.loc[a_mask, 'level'] = 'a'
df1.loc[b_mask, 'level'] = 'b'
df1.loc[c_mask, 'level'] = 'c'
df1

Out[61]:
         date  x1 level
0  2014-01-01   1     a
1  2014-01-01   5     a
2  2014-01-01   2     a
3  2014-01-01   6     a
4  2014-01-02   3     a
5  2014-01-02   7     a
6  2014-01-03   4     a
7  2014-01-04   7     a
8  2014-01-05   9     b
9  2014-01-05  10     b
10 2014-01-05   5     b
11 2014-01-05   3     b
12 2014-01-06   2     b
13 2014-01-07   7     b
14 2014-01-07   3     b
15 2014-01-08   8     b
16 2014-01-09   4     c
17 2014-01-10   3     c
18 2014-01-10   7     c
19 2014-01-10   2     c
20 2014-01-10   5     c
21 2014-01-11   5     c
22 2014-01-12   2     c
23 2014-01-12   2     c
[61]中的

df1['date']=pd.to_datetime(df1['date'])
a_mask=(df1['date']>='2014-01-01')和(df1['date']='2014-01-05')和(df1['date']='2014-01-09')和(df1['date']您可以使用:


我同意,但想不出一种使用pd.cut的方法,你的答案更好,所以我将删除我的我想这取决于级别的数量对于少量级别这将比调用apply更快,但代码方面它不会扩展到不断增加的级别我想你是对的-取决于日期的数量。