Python 如何创建包含一天中部分时间(晚上、早上、下午、晚上)的新列 我希望创造一个包含一天中的一部分的新专栏。我会考虑{晚上(00∶01至6:00)、早晨(6:01至12:00)、下午(12:01至18:00)、晚上(18:011:00)}。 以下是数据帧: package_name name starttime duration UserId com.facebook.katana Facebook 2020-09-19 6:02:06.019 28.077 4 com.android.systemui System UI 2020-09-19 16:42:34.096 28.077 4 com.android.systemui System UI 2020-09-19 19:51:35.778 0.329 4 com.facebook.katana Facebook 2020-09-19 10:56:39.129 0.329 4 com.android.systemui System UI 2020-09-19 01:48:32.067 3.022 4
我在论坛上发现了类似的问题,但我无法修改代码以处理我的数据。您可以执行以下操作:Python 如何创建包含一天中部分时间(晚上、早上、下午、晚上)的新列 我希望创造一个包含一天中的一部分的新专栏。我会考虑{晚上(00∶01至6:00)、早晨(6:01至12:00)、下午(12:01至18:00)、晚上(18:011:00)}。 以下是数据帧: package_name name starttime duration UserId com.facebook.katana Facebook 2020-09-19 6:02:06.019 28.077 4 com.android.systemui System UI 2020-09-19 16:42:34.096 28.077 4 com.android.systemui System UI 2020-09-19 19:51:35.778 0.329 4 com.facebook.katana Facebook 2020-09-19 10:56:39.129 0.329 4 com.android.systemui System UI 2020-09-19 01:48:32.067 3.022 4,python,pandas,csv,Python,Pandas,Csv,我在论坛上发现了类似的问题,但我无法修改代码以处理我的数据。您可以执行以下操作: In [4249]: df.starttime = pd.to_datetime(df.starttime) In [4253]: conditions = [df.starttime.dt.hour.between(0, 6), df.starttime.dt.hour.between(6, 12), df.starttime.dt.hour.between(12, 18), df.starttime.dt.ho
In [4249]: df.starttime = pd.to_datetime(df.starttime)
In [4253]: conditions = [df.starttime.dt.hour.between(0, 6), df.starttime.dt.hour.between(6, 12), df.starttime.dt.hour.between(12, 18), df.starttime.dt.hour.between(18,24)]
In [4254]: choices = ['night', 'morning', 'afternoon', 'evening']
In [4257]: df['part_of_day'] = np.select(conditions, choices)
In [4258]: df
Out[4258]:
package_name name starttime duration UserId part_of_day
0 com.facebook.katana Facebook 2020-09-19 06:02:06.019 28.077 4 night
1 com.android.systemui System UI 2020-09-19 16:42:34.096 28.077 4 afternoon
2 com.android.systemui System UI 2020-09-19 19:51:35.778 0.329 4 evening
3 com.android.systemui System UI 2020-09-19 01:48:32.067 0.329 4 night
你可以做:
In [4249]: df.starttime = pd.to_datetime(df.starttime)
In [4253]: conditions = [df.starttime.dt.hour.between(0, 6), df.starttime.dt.hour.between(6, 12), df.starttime.dt.hour.between(12, 18), df.starttime.dt.hour.between(18,24)]
In [4254]: choices = ['night', 'morning', 'afternoon', 'evening']
In [4257]: df['part_of_day'] = np.select(conditions, choices)
In [4258]: df
Out[4258]:
package_name name starttime duration UserId part_of_day
0 com.facebook.katana Facebook 2020-09-19 06:02:06.019 28.077 4 night
1 com.android.systemui System UI 2020-09-19 16:42:34.096 28.077 4 afternoon
2 com.android.systemui System UI 2020-09-19 19:51:35.778 0.329 4 evening
3 com.android.systemui System UI 2020-09-19 01:48:32.067 0.329 4 night
如果性能很重要,请在此处使用将日期时间转换为小时:
df['starttime'] = pd.to_datetime(df['starttime'])
df['new'] = pd.cut(df['starttime'].dt.hour,
bins=[0,6,12,18,23],
labels=['night','morning','afternoon','evening'],
include_lowest=True)
print (df)
package_name name starttime duration UserId \
0 com.facebook.katana Facebook 2020-09-19 06:02:06.019 28.077 4
1 com.android.systemui System UI 2020-09-19 16:42:34.096 28.077 4
2 com.android.systemui System UI 2020-09-19 19:51:35.778 0.329 4
3 com.facebook.katana Facebook 2020-09-19 10:56:39.129 0.329 4
4 com.android.systemui System UI 2020-09-19 01:48:32.067 3.022 4
new
0 night
1 afternoon
2 evening
3 morning
4 night
5k行的性能测试:
df['starttime'] = pd.to_datetime(df['starttime'])
df = pd.concat([df] * 1000, ignore_index=True)
In [79]: %%timeit
...: conditions = [df.starttime.dt.hour.between(0, 6), df.starttime.dt.hour.between(6, 12), df.starttime.dt.hour.between(12, 18), df.starttime.dt.hour.between(18,24)]
...:
...: choices = ['night', 'morning', 'afternoon', 'evening']
...:
...: df['part_of_day'] = np.select(conditions, choices)
...:
5.28 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [80]: %%timeit
...: df['new'] = pd.cut(df['starttime'].dt.hour,
...: bins=[0,6,12,18,23],
...: labels=['night','morning','afternoon','evening'],
...: include_lowest=True)
...:
2.1 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
对于5万行:
df['starttime'] = pd.to_datetime(df['starttime'])
df = pd.concat([df] * 10000, ignore_index=True)
In [82]: %%timeit
...: conditions = [df.starttime.dt.hour.between(0, 6), df.starttime.dt.hour.between(6, 12), df.starttime.dt.hour.between(12, 18), df.starttime.dt.hour.between(18,24)]
...:
...: choices = ['night', 'morning', 'afternoon', 'evening']
...:
...: df['part_of_day'] = np.select(conditions, choices)
...:
...:
26.9 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [83]: %%timeit
...: df['new'] = pd.cut(df['starttime'].dt.hour,
...: bins=[0,6,12,18,23],
...: labels=['night','morning','afternoon','evening'],
...: include_lowest=True)
...:
7.46 ms ± 68.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
如果性能很重要,请在此处使用将日期时间转换为小时:
df['starttime'] = pd.to_datetime(df['starttime'])
df['new'] = pd.cut(df['starttime'].dt.hour,
bins=[0,6,12,18,23],
labels=['night','morning','afternoon','evening'],
include_lowest=True)
print (df)
package_name name starttime duration UserId \
0 com.facebook.katana Facebook 2020-09-19 06:02:06.019 28.077 4
1 com.android.systemui System UI 2020-09-19 16:42:34.096 28.077 4
2 com.android.systemui System UI 2020-09-19 19:51:35.778 0.329 4
3 com.facebook.katana Facebook 2020-09-19 10:56:39.129 0.329 4
4 com.android.systemui System UI 2020-09-19 01:48:32.067 3.022 4
new
0 night
1 afternoon
2 evening
3 morning
4 night
5k行的性能测试:
df['starttime'] = pd.to_datetime(df['starttime'])
df = pd.concat([df] * 1000, ignore_index=True)
In [79]: %%timeit
...: conditions = [df.starttime.dt.hour.between(0, 6), df.starttime.dt.hour.between(6, 12), df.starttime.dt.hour.between(12, 18), df.starttime.dt.hour.between(18,24)]
...:
...: choices = ['night', 'morning', 'afternoon', 'evening']
...:
...: df['part_of_day'] = np.select(conditions, choices)
...:
5.28 ms ± 451 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [80]: %%timeit
...: df['new'] = pd.cut(df['starttime'].dt.hour,
...: bins=[0,6,12,18,23],
...: labels=['night','morning','afternoon','evening'],
...: include_lowest=True)
...:
2.1 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
对于5万行:
df['starttime'] = pd.to_datetime(df['starttime'])
df = pd.concat([df] * 10000, ignore_index=True)
In [82]: %%timeit
...: conditions = [df.starttime.dt.hour.between(0, 6), df.starttime.dt.hour.between(6, 12), df.starttime.dt.hour.between(12, 18), df.starttime.dt.hour.between(18,24)]
...:
...: choices = ['night', 'morning', 'afternoon', 'evening']
...:
...: df['part_of_day'] = np.select(conditions, choices)
...:
...:
26.9 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [83]: %%timeit
...: df['new'] = pd.cut(df['starttime'].dt.hour,
...: bins=[0,6,12,18,23],
...: labels=['night','morning','afternoon','evening'],
...: include_lowest=True)
...:
7.46 ms ± 68.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)