Python 使用IF条件添加多行
我有以下城市自行车旅行的数据框。但是,我在处理超过一小时的行程时遇到一些问题(我想在数据模型中使用YYYYmmDDhh作为复合键)。所以我想做的是创建一个列“keyhour”,我可以连接其他表。如果开始时间=结束时间,则根据开始时间的YYYYmmDDhh。但是,如果end_hour大于start_hour,我希望在我的数据帧中插入具有相同TORID的那么多行,以指示行程持续了几个小时Python 使用IF条件添加多行,python,pandas,Python,Pandas,我有以下城市自行车旅行的数据框。但是,我在处理超过一小时的行程时遇到一些问题(我想在数据模型中使用YYYYmmDDhh作为复合键)。所以我想做的是创建一个列“keyhour”,我可以连接其他表。如果开始时间=结束时间,则根据开始时间的YYYYmmDDhh。但是,如果end_hour大于start_hour,我希望在我的数据帧中插入具有相同TORID的那么多行,以指示行程持续了几个小时 started_at
started_at ended_at duration start_station_id start_station_name start_station_description ... end_station_description end_station_latitude end_station_longitude TourID start_hour end_hour
0 2020-05-01 03:03:14.941000+00:00 2020-05-01 03:03:14.941000+00:00 635 484 Karenlyst allé ved Skabos vei ... langs Drammensveien 59.914145 10.715505 0 3 3
1 2020-05-01 03:05:48.529000+00:00 2020-05-01 03:05:48.529000+00:00 141 455 Sofienbergparken sør langs Sofienberggata ... ved Sars gate 59.921206 10.769989 1 3 3
2 2020-05-01 03:13:33.156000+00:00 2020-05-01 03:13:33.156000+00:00 330 550 Thereses gate ved Bislett trikkestopp ... ved Kristian IVs gate 59.914767 10.740971 2 3 3
3 2020-05-01 03:14:14.549000+00:00 2020-05-01 03:14:14.549000+00:00 479 597 Fredensborg ved rundkjøringen ... ved Oslo City 59.912334 10.752292 3 3 3
4 2020-05-01 03:20:12.355000+00:00 2020-05-01 03:20:12.355000+00:00 629 617 Bjerregaardsgate Øst ved Uelands gate ... langs Oslo gate 59.908255 10.767800 4 3 3
因此,例如,如果开始时间=2020-05-01 03:03:14.941000+00:00,结束时间=2020-05-01 06:03:14.941000+00:00,开始时间=3,结束时间=6,并且旅游时间=1,我希望有以下行:
关键时刻;巡回演出
2020050103 ;1
2020050104 ;1
2020050105 ;1
2020050106 ;1
以及与该行程id相关的所有其他值(持续时间等)
然而,我真的找不到任何方法在熊猫身上做到这一点。是否有可能或者必须使用纯python来重新编写源csv
谢谢你的建议 假设您的数据帧是
df
,并且您有导入熊猫作为pd
# convert to datetime and rounddown to hour
df['started_at'] = pd.to_datetime(df['started_at']).dt.floor(freq='H')
df['ended_at'] = pd.to_datetime(df['ended_at']).dt.floor(freq='H')
# this creates a list of hourly datetime ranges from started_at to ended_at
df['keyhour'] = df.apply(lambda x: list(pd.date_range(x['started_at'], x['ended_at'], freq="1H")), axis='columns')
# this just expands to row each element in the list of keyhour column
df = df.explode('keyhour')
# conversts it to a string, of the format you specified
df['keyhour'] = df['keyhour'].dt.strftime('%Y%m%d%H')
df
假设您的数据帧是
df
,并且您有导入熊猫作为pd
# convert to datetime and rounddown to hour
df['started_at'] = pd.to_datetime(df['started_at']).dt.floor(freq='H')
df['ended_at'] = pd.to_datetime(df['ended_at']).dt.floor(freq='H')
# this creates a list of hourly datetime ranges from started_at to ended_at
df['keyhour'] = df.apply(lambda x: list(pd.date_range(x['started_at'], x['ended_at'], freq="1H")), axis='columns')
# this just expands to row each element in the list of keyhour column
df = df.explode('keyhour')
# conversts it to a string, of the format you specified
df['keyhour'] = df['keyhour'].dt.strftime('%Y%m%d%H')
df