Python 每小时/每天重新采样时间戳,并合并其他相应行的值
希望每小时/每天/每周/每月/每年对时间段重新采样,并合并相应行中的值 数据帧(26列) 所需输出Python 每小时/每天重新采样时间戳,并合并其他相应行的值,python,pandas,dataframe,Python,Pandas,Dataframe,希望每小时/每天/每周/每月/每年对时间段重新采样,并合并相应行中的值 数据帧(26列) 所需输出 gatewayReqInTime_Date count _id reqId statusCode requestUrl ....... payload 2019-06-19 13:00:00 1 [0] [R1] [401] ["/a"] . 2019-06-19 14:00:00
gatewayReqInTime_Date count _id reqId statusCode requestUrl ....... payload
2019-06-19 13:00:00 1 [0] [R1] [401] ["/a"] .
2019-06-19 14:00:00 2 [1,4] [R2, R5] [206, 201] ["/b", "/e"] .
2019-06-19 16:00:00 1 [2] [R3] [200] ["/c"] .
2019-12-03 15:00:00 1 [3] [R4] [200] ["/d"] .
我可以将时间戳转换为日期时间,重新采样并获得计数(输出的前2列)
然而,我在组合这些值时遇到了麻烦。试过的groupby、agg等
df['gateWayReqinTime_Date'] = df['gateWayReqinTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df2 = (df.groupby(pd.Grouper(key='gateWayReqinTime_Date', freq='H'))
.size()
.reset_index(name='Count'))
df2
PS:这是Javascript开发人员使用Python的第一周。请建议将列标签视为动态的或非特定的方法。以下方法接近您的要求:
df = df.astype(str)
df['gateWayReqInTime_Date'] = df['gateWayReqInTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df.drop('gateWayReqInTime', inplace=True, axis=1)
df['gateWayReqInTime_Date'] = pd.to_datetime(df['gateWayReqInTime_Date'])
df2 = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H')).agg(', '.join)
df2['count'] = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H'))['_id'].agg('count')
print(df2.head())
_id reqId statusCode requestUrl payload count
gateWayReqInTime_Date
2019-06-19 12:00:00 0 R1 401 /a {'a': 'b'} 1
2019-06-19 13:00:00 4, 1 R5, R2 201, 206 /e, /b {'i': 'j'}, {'c': 'd'} 2
2019-06-19 14:00:00 0
2019-06-19 15:00:00 2 R3 200 /c {'e': 'f'} 1
2019-06-19 16:00:00 0
df = df.astype(str)
df['gateWayReqInTime_Date'] = df['gateWayReqInTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df.drop('gateWayReqInTime', inplace=True, axis=1)
df['gateWayReqInTime_Date'] = pd.to_datetime(df['gateWayReqInTime_Date'])
df2 = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H')).agg(', '.join)
df2['count'] = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H'))['_id'].agg('count')
print(df2.head())
_id reqId statusCode requestUrl payload count
gateWayReqInTime_Date
2019-06-19 12:00:00 0 R1 401 /a {'a': 'b'} 1
2019-06-19 13:00:00 4, 1 R5, R2 201, 206 /e, /b {'i': 'j'}, {'c': 'd'} 2
2019-06-19 14:00:00 0
2019-06-19 15:00:00 2 R3 200 /c {'e': 'f'} 1
2019-06-19 16:00:00 0