Python 熊猫按日期分组和计数。然后将计数转换为列名
我有这个数据框Python 熊猫按日期分组和计数。然后将计数转换为列名,python,pandas,Python,Pandas,我有这个数据框 import pandas as pd from datetime import datetime df = pd.DataFrame([ {"_id": "1", "date": datetime.strptime("2020-09-29 07:00:00", '%Y-%m-%d %H:%M:%S'), "status": "started"},
import pandas as pd
from datetime import datetime
df = pd.DataFrame([
{"_id": "1", "date": datetime.strptime("2020-09-29 07:00:00", '%Y-%m-%d %H:%M:%S'), "status": "started"},
{"_id": "2", "date": datetime.strptime("2020-09-29 14:00:00", '%Y-%m-%d %H:%M:%S'), "status": "end"},
{"_id": "3", "date": datetime.strptime("2020-09-25 17:00:00", '%Y-%m-%d %H:%M:%S'), "status": "started"},
{"_id": "4", "date": datetime.strptime("2020-09-17 09:00:00", '%Y-%m-%d %H:%M:%S'), "status": "end"},
{"_id": "5", "date": datetime.strptime("2020-09-19 07:00:00", '%Y-%m-%d %H:%M:%S'), "status": "end"},
{"_id": "6", "date": datetime.strptime("2020-09-19 08:00:00", '%Y-%m-%d %H:%M:%S'), "status": "end"},
]).set_index('date')
看起来是这样的:
_id status
date
2020-09-29 07:00:00 1 started
2020-09-29 14:00:00 2 end
2020-09-25 17:00:00 3 started
2020-09-17 09:00:00 4 end
2020-09-19 07:00:00 5 end
我试着按天分组并计算每个状态。但是我想在列名中包含该名称的名称 以下是所需的输出:
status_started status_end
date
2020-09-29 07:00:00 1 1
2020-09-25 17:00:00 1 0
2020-09-17 09:00:00 0 1
2020-09-19 07:00:00 0 2
我试过这个:
df = df.groupby([pd.Grouper(freq='d'), 'status']).agg({'status': "count"})
df = df.reset_index(level="status")
out:
status
date status
2020-09-17 end 1
2020-09-19 end 2
2020-09-25 started 1
2020-09-29 end 1
2020-09-29 started 1
但是没有成功地转换df。您只需要
取消堆栈
:
df.groupby([pd.Grouper(freq='d'), 'status']).size().unstack('status', fill_value=0)
输出:
status end started
date
2020-09-17 1 0
2020-09-19 2 0
2020-09-25 0 1
2020-09-29 1 1
您只需
取消堆叠
:
df.groupby([pd.Grouper(freq='d'), 'status']).size().unstack('status', fill_value=0)
输出:
status end started
date
2020-09-17 1 0
2020-09-19 2 0
2020-09-25 0 1
2020-09-29 1 1
您可以尝试
交叉表:
d = pd.crosstab(df.index.date, df['status'])\
.rename_axis('date').add_prefix('status_')
您可以尝试交叉表:
d = pd.crosstab(df.index.date, df['status'])\
.rename_axis('date').add_prefix('status_')