Python 熊猫从CSV和groupby中读取日期每月的总营业日
data.CSVPython 熊猫从CSV和groupby中读取日期每月的总营业日,python,pandas,datetime,pandas-groupby,Python,Pandas,Datetime,Pandas Groupby,data.CSV ID Activity Month Activity Date 0 04/2019 04-01-2019 1 05/2019 05-13-2019 2 05/2019 05-25-2019 3 06/2019 06-10-2019 4 06/2019 06-19-2019 5 07/2019 07-15-2019 6 07/2019 07-18-2019 7 07/201
ID Activity Month Activity Date
0 04/2019 04-01-2019
1 05/2019 05-13-2019
2 05/2019 05-25-2019
3 06/2019 06-10-2019
4 06/2019 06-19-2019
5 07/2019 07-15-2019
6 07/2019 07-18-2019
7 07/2019 07-29-2019
8 08/2019 06-03-2019
9 08/2019 06-15-2019
10 08/2019 06-20-2019
我的计划
阅读csv:
df=pd.read\u csv('data.csv'))
转换为日期时间:
df['Activity Date']=pd.to_datetime(df['Activity Date'],dayfirst=True)
按“活动月”列分组:
grouped=df.groupby(['Activity Month'])['Activity Date'].count()
打印(分组)
对日期进行分组时,执行工作日计算:
这部分我不知道该怎么做。已经输了
我用来计算工作日的代码
import calendar
import datetime
x = datetime.date(2019, 4, 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
print ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
我不完全清楚这里的问题陈述,但如果您想计算每个
活动月的工作日数
,您可以将您的计算包装在一个方法中,并将该方法应用于活动月
列(lambda表达式基本上是针对指定列的每一行的for循环操作)
但是,在每个单元格中存储重复的信息是一个坏主意。最好是简单地返回
工作日
,而不是将其嵌入字符串中。为了不忘记这一点,我在手机上,会在几个小时内检查并给你答案。我最近也在这个库中工作过!感谢@CeliusS的努力tingherI我想这就是我想要的。只是为了学习,所以现在还可以。无论如何,谢谢!无论如何,我可以知道为什么我们需要添加“.reset_index()”吗请?我试图删除它,但当您运行groupby
操作时,代码不起作用,它将Activity Month
作为DataFrame
reset\u index()
将当前索引替换为行号,并将原始索引作为新列放入(仅当您未通过drop=True
时)。
import calendar
import datetime
x = datetime.date(2019, 4, 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
print ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
Total business days for month (4) is 22 days
Total business days for month (5) is 23 days
Total business days for month (6) is 20 days
Total business days for month (7) is 23 days
Total business days for month (8) is 22 days
grouped = df.groupby(['Activity Month'])['Activity Date'].count().reset_index()
def get_business_days(x):
x = datetime.date(int(x.split('/')[1]), int(x.split('/')[0]), 1)
cal = calendar.Calendar()
working_days = len([x for x in cal.itermonthdays2(x.year, x.month) if x[0] !=0 and x[1] < 5])
return ("Total business days for month (" + str(x.month) + ") is " + str(working_days) + " days")
grouped['Activity Month'].apply(get_business_days)
0 Total business days for month (4) is 22 days
1 Total business days for month (5) is 23 days
2 Total business days for month (6) is 20 days
3 Total business days for month (7) is 23 days
4 Total business days for month (8) is 22 days