Python 你能用熊猫来划分日期时间月吗?
有没有一种方法可以创建新的列来表示包含两个日期时间之间差值的各个月份?对于每个新的每月列,输出可能是二进制值。我的想法是这样的(这是行不通的): 为此:Python 你能用熊猫来划分日期时间月吗?,python,pandas,Python,Pandas,有没有一种方法可以创建新的列来表示包含两个日期时间之间差值的各个月份?对于每个新的每月列,输出可能是二进制值。我的想法是这样的(这是行不通的): 为此: end name start 0 28/02/2012 joe bloggs 01/01/2012 1 15/03/2012 jane bloggs 01/02/2012 2 17/05/2012 jim bloggs 01/04/2012 3 18/04/2012 john b
end name start
0 28/02/2012 joe bloggs 01/01/2012
1 15/03/2012 jane bloggs 01/02/2012
2 17/05/2012 jim bloggs 01/04/2012
3 18/04/2012 john bloggs 01/02/2012
为此:
end 1 2 3 4 5 name start
0 28/02/2012 1 1 0 0 0 joe bloggs 01/01/2012
1 15/03/2012 0 1 1 0 0 jane bloggs 01/02/2012
2 17/05/2012 0 0 0 1 1 jim bloggs 01/04/2012
3 18/04/2012 0 1 1 1 0 john bloggs 01/02/2012
首先,您必须使用
pd将日期列转换为datetime。转换为\u datetime
,如下所示:
import pandas as pd
example['end'] = pd.to_datetime(example['end'], dayfirst=True) #default is ydm...
example['start'] = pd.to_datetime(example['start'], dayfirst=True)
然后在for循环中,只需设置适当的值:
example[str(i)] = 0
example[str(i)][( i >= example['start'].dt.month) & (example['end'].dt.month >= i)] = 1
(从耶斯雷尔的回答中盗取dt.month
),结果:
import pandas as pd
example['end'] = pd.to_datetime(example['end'], dayfirst=True) #default is ydm...
example['start'] = pd.to_datetime(example['start'], dayfirst=True)
for i in range(1,13):
example[str(i)] = 0
example[str(i)][( i >= example['start'].dt.month) & (example['end'].dt.month >= i)] = 1
In[101]: example
Out[101]:
end name start 1 2 3 4 5 6 7 8 9 10 11 12
0 2012-02-28 joe bloggs 2012-01-01 1 1 0 0 0 0 0 0 0 0 0 0
1 2012-03-15 jane bloggs 2012-02-01 0 1 1 0 0 0 0 0 0 0 0 0
2 2012-05-17 jim bloggs 2012-04-01 0 0 0 1 1 0 0 0 0 0 0 0
3 2012-04-18 john bloggs 2012-02-01 0 1 1 1 0 0 0 0 0 0 0 0
这将导致:
import pandas as pd
example['end'] = pd.to_datetime(example['end'], dayfirst=True) #default is ydm...
example['start'] = pd.to_datetime(example['start'], dayfirst=True)
for i in range(1,13):
example[str(i)] = 0
example[str(i)][( i >= example['start'].dt.month) & (example['end'].dt.month >= i)] = 1
In[101]: example
Out[101]:
end name start 1 2 3 4 5 6 7 8 9 10 11 12
0 2012-02-28 joe bloggs 2012-01-01 1 1 0 0 0 0 0 0 0 0 0 0
1 2012-03-15 jane bloggs 2012-02-01 0 1 1 0 0 0 0 0 0 0 0 0
2 2012-05-17 jim bloggs 2012-04-01 0 0 0 1 1 0 0 0 0 0 0 0
3 2012-04-18 john bloggs 2012-02-01 0 1 1 1 0 0 0 0 0 0 0 0
我认为您可以主要使用: 这将有助于:
example = pd.read_csv(FILE, parse_dates=[0, 2], dayfirst=True)
for i in [1, 2, 3, 4, 5]:
i_name = str(i)
example[i_name] = example.apply(lambda example: example["start"] <= pd.datetime(2012, i, 1) <= example["end"], axis=1).astype(int)
example=pd.read\u csv(文件,解析日期=[0,2],dayfirst=True)
因为我在[1,2,3,4,5]中:
i_name=str(i)
示例[i_name]=example.apply(lambda示例:example[“start”]是的,这比我的解决方案更简洁
example = pd.read_csv(FILE, parse_dates=[0, 2], dayfirst=True)
for i in [1, 2, 3, 4, 5]:
i_name = str(i)
example[i_name] = example.apply(lambda example: example["start"] <= pd.datetime(2012, i, 1) <= example["end"], axis=1).astype(int)