Python 你能用熊猫来划分日期时间月吗？_Python_Pandas

Python 你能用熊猫来划分日期时间月吗？

python pandas

Python 你能用熊猫来划分日期时间月吗？,python,pandas,Python,Pandas,有没有一种方法可以创建新的列来表示包含两个日期时间之间差值的各个月份？对于每个新的每月列，输出可能是二进制值。我的想法是这样的（这是行不通的）：为此： end name start 0 28/02/2012 joe bloggs 01/01/2012 1 15/03/2012 jane bloggs 01/02/2012 2 17/05/2012 jim bloggs 01/04/2012 3 18/04/2012 john b

有没有一种方法可以创建新的列来表示包含两个日期时间之间差值的各个月份？对于每个新的每月列，输出可能是二进制值。我的想法是这样的（这是行不通的）：

为此：

    end         name        start
0   28/02/2012  joe bloggs  01/01/2012
1   15/03/2012  jane bloggs 01/02/2012
2   17/05/2012  jim bloggs  01/04/2012
3   18/04/2012  john bloggs 01/02/2012

为此：

    end         1   2   3   4   5   name        start
0   28/02/2012  1   1   0   0   0   joe bloggs  01/01/2012
1   15/03/2012  0   1   1   0   0   jane bloggs 01/02/2012
2   17/05/2012  0   0   0   1   1   jim bloggs  01/04/2012
3   18/04/2012  0   1   1   1   0   john bloggs 01/02/2012

首先，您必须使用

pd将日期列转换为datetime。转换为\u datetime

，如下所示：

import pandas as pd
example['end'] = pd.to_datetime(example['end'], dayfirst=True) #default is ydm...
example['start'] = pd.to_datetime(example['start'], dayfirst=True)

然后在for循环中，只需设置适当的值：

example[str(i)] = 0
example[str(i)][( i >= example['start'].dt.month) & (example['end'].dt.month >= i)] = 1

（从耶斯雷尔的回答中盗取

dt.month

），结果：

import pandas as pd
example['end'] = pd.to_datetime(example['end'], dayfirst=True) #default is ydm...
example['start'] = pd.to_datetime(example['start'], dayfirst=True)

for i in range(1,13):
  example[str(i)] = 0
  example[str(i)][( i >= example['start'].dt.month) & (example['end'].dt.month >= i)] = 1

In[101]: example
Out[101]: 
         end         name      start  1  2  3  4  5  6  7  8  9  10  11  12
0 2012-02-28   joe bloggs 2012-01-01  1  1  0  0  0  0  0  0  0   0   0   0
1 2012-03-15  jane bloggs 2012-02-01  0  1  1  0  0  0  0  0  0   0   0   0
2 2012-05-17   jim bloggs 2012-04-01  0  0  0  1  1  0  0  0  0   0   0   0
3 2012-04-18  john bloggs 2012-02-01  0  1  1  1  0  0  0  0  0   0   0   0

这将导致：

import pandas as pd
example['end'] = pd.to_datetime(example['end'], dayfirst=True) #default is ydm...
example['start'] = pd.to_datetime(example['start'], dayfirst=True)

for i in range(1,13):
  example[str(i)] = 0
  example[str(i)][( i >= example['start'].dt.month) & (example['end'].dt.month >= i)] = 1

In[101]: example
Out[101]: 
         end         name      start  1  2  3  4  5  6  7  8  9  10  11  12
0 2012-02-28   joe bloggs 2012-01-01  1  1  0  0  0  0  0  0  0   0   0   0
1 2012-03-15  jane bloggs 2012-02-01  0  1  1  0  0  0  0  0  0   0   0   0
2 2012-05-17   jim bloggs 2012-04-01  0  0  0  1  1  0  0  0  0   0   0   0
3 2012-04-18  john bloggs 2012-02-01  0  1  1  1  0  0  0  0  0   0   0   0

我认为您可以主要使用：

这将有助于：

example = pd.read_csv(FILE, parse_dates=[0, 2], dayfirst=True)
for i in [1, 2, 3, 4, 5]:
    i_name = str(i)
    example[i_name] = example.apply(lambda example: example["start"] <= pd.datetime(2012, i, 1) <= example["end"], axis=1).astype(int)

example=pd.read\u csv（文件，解析日期=[0,2]，dayfirst=True）
因为我在[1,2,3,4,5]中：
i_name=str（i）
示例[i_name]=example.apply（lambda示例：example[“start”]是的，这比我的解决方案更简洁
example = pd.read_csv(FILE, parse_dates=[0, 2], dayfirst=True)
for i in [1, 2, 3, 4, 5]:
    i_name = str(i)
    example[i_name] = example.apply(lambda example: example["start"] <= pd.datetime(2012, i, 1) <= example["end"], axis=1).astype(int)