Python 从日期列创建月份列(但是日期列不包含月份信息)
我有这样一个数据,想创建一个名为“Month”的列Python 从日期列创建月份列(但是日期列不包含月份信息),python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个数据,想创建一个名为“Month”的列 +---------+------------------+------+------+ | Name | Task | Team | Date | +---------+------------------+------+------+ | John | Market study | A | 1 | +---------+------------------+------+------+
+---------+------------------+------+------+
| Name | Task | Team | Date |
+---------+------------------+------+------+
| John | Market study | A | 1 |
+---------+------------------+------+------+
| Michael | Customer service | B | 1 |
+---------+------------------+------+------+
| Joanna | Accounting | C | 1 |
+---------+------------------+------+------+
| John | Accounting | B | 2 |
+---------+------------------+------+------+
| Michael | Customer service | A | 2 |
+---------+------------------+------+------+
| Joanna | Market study | C | 2 |
+---------+------------------+------+------+
| John | Customer service | C | 1 |
+---------+------------------+------+------+
| Michael | Market study | A | 1 |
+---------+------------------+------+------+
| Joanna | Customer service | B | 1 |
+---------+------------------+------+------+
| John | Market study | A | 2 |
+---------+------------------+------+------+
| Michael | Customer service | B | 2 |
+---------+------------------+------+------+
| Joanna | Accounting | C | 2 |
+---------+------------------+------+------+
所以基本上,我有日期信息,但日期不包含它所属的月份。但是,我知道如果是第一次,那么它将属于第1个月,如果是第二次,那么它将属于第2个月。例如,日期1出现3次,然后被日期2打断。所以前3次属于第1个月,接下来3次属于第2个月。因此,我希望我的结果如下:
+---------+------------------+------+------+---------+
| Name | Task | Team | Date | Month |
+---------+------------------+------+------+---------+
| John | Market study | A | 1 | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B | 1 | Month 1 |
+---------+------------------+------+------+---------+
| Joanna | Accounting | C | 1 | Month 1 |
+---------+------------------+------+------+---------+
| John | Accounting | B | 2 | Month 1 |
+---------+------------------+------+------+---------+
| Michael | Customer service | A | 2 | Month 1 |
+---------+------------------+------+------+---------+
| Joanna | Market study | C | 2 | Month 1 |
+---------+------------------+------+------+---------+
| John | Customer service | C | 1 | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Market study | A | 1 | Month 2 |
+---------+------------------+------+------+---------+
| Joanna | Customer service | B | 1 | Month 2 |
+---------+------------------+------+------+---------+
| John | Market study | A | 2 | Month 2 |
+---------+------------------+------+------+---------+
| Michael | Customer service | B | 2 | Month 2 |
+---------+------------------+------+------+---------+
| Joanna | Accounting | C | 2 | Month 2 |
+---------+------------------+------+------+---------+
除了使用一些循环之外,我没有任何想法。
谢谢大家。如果我正确理解了问题,您可以执行以下操作:创建掩码
s
,将每个连续值分成单独的组。从s
,为每组的每个值创建掩码s1
。Groupbys1
和Date
并执行cumcount
和map
以创建所需的输出:
s = df.Date.ne(df.Date.shift()).cumsum()
s1 = df.Date.groupby(s).cumcount()
df['Month'] = df.groupby([s1, 'Date']).Name.cumcount().add(1).map(lambda x: 'Month '+str(x))
Out[897]:
Name Task Team Date Month
0 John Market-study A 1 Month 1
1 Michael Customer-service B 1 Month 1
2 Joanna Accounting C 1 Month 1
3 John Accounting B 2 Month 1
4 Michael Customer-service A 2 Month 1
5 Joanna Market-study C 2 Month 1
6 John Customer-service C 1 Month 2
7 Michael Market-study A 1 Month 2
8 Joanna Customer-service B 1 Month 2
9 John Market-study A 2 Month 2
10 Michael Customer-service B 2 Month 2
11 Joanna Accounting C 2 Month 2
如果第一次发生什么?对不起,我找不到模式。如果日期是第一次出现(在date列中,您可以看到date 1,2。但是date 1出现了3次,然后被date 2打断,然后date 1再次出现,然后date 2再次出现。这意味着date 1第一次出现时,它属于month 1,第二次出现时,它属于month 2。对不起,我仍然不太清楚,当日期和名称重复时,然后是month sh应该是2?例如,日期1出现3次,然后被日期2打断,也出现3次。这意味着日期1在前3次出现中属于月份1,下一次出现时,它属于月份2。我真的不知道如何说得更清楚,英语不是我的第一语言,这不是一个容易问的问题。这取决于人们如何回答对于您的结果,我建议查看pd.where(or.np)和np.select
df.Date
相当于df['Date']