Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/361.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 每小时/每天重新采样时间戳,并合并其他相应行的值_Python_Pandas_Dataframe - Fatal编程技术网

Python 每小时/每天重新采样时间戳,并合并其他相应行的值

Python 每小时/每天重新采样时间戳,并合并其他相应行的值,python,pandas,dataframe,Python,Pandas,Dataframe,希望每小时/每天/每周/每月/每年对时间段重新采样,并合并相应行中的值 数据帧(26列) 所需输出 gatewayReqInTime_Date count _id reqId statusCode requestUrl ....... payload 2019-06-19 13:00:00 1 [0] [R1] [401] ["/a"] . 2019-06-19 14:00:00

希望每小时/每天/每周/每月/每年对时间段重新采样,并合并相应行中的值

数据帧(26列)

所需输出

gatewayReqInTime_Date count _id     reqId      statusCode   requestUrl ....... payload
2019-06-19 13:00:00    1    [0]      [R1]      [401]        ["/a"]                .
2019-06-19 14:00:00    2    [1,4]    [R2, R5]  [206, 201]   ["/b", "/e"]          .
2019-06-19 16:00:00    1    [2]      [R3]      [200]        ["/c"]                .
2019-12-03 15:00:00    1    [3]      [R4]      [200]        ["/d"]                .
 
我可以将时间戳转换为日期时间,重新采样并获得计数(输出的前2列)

然而,我在组合这些值时遇到了麻烦。试过的groupby、agg等

df['gateWayReqinTime_Date'] = df['gateWayReqinTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df2 = (df.groupby(pd.Grouper(key='gateWayReqinTime_Date', freq='H'))
        .size()
        .reset_index(name='Count'))
df2

PS:这是Javascript开发人员使用Python的第一周。请建议将列标签视为动态的或非特定的方法。以下方法接近您的要求:

df = df.astype(str)
df['gateWayReqInTime_Date'] = df['gateWayReqInTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df.drop('gateWayReqInTime', inplace=True, axis=1)
df['gateWayReqInTime_Date'] = pd.to_datetime(df['gateWayReqInTime_Date'])
df2 = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H')).agg(', '.join)
df2['count'] = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H'))['_id'].agg('count')

print(df2.head())

                        _id   reqId statusCode requestUrl                 payload  count
gateWayReqInTime_Date                                       
2019-06-19 12:00:00       0      R1        401         /a               {'a': 'b'}      1
2019-06-19 13:00:00    4, 1  R5, R2   201, 206     /e, /b   {'i': 'j'}, {'c': 'd'}      2
2019-06-19 14:00:00                                                                     0
2019-06-19 15:00:00       2      R3        200         /c               {'e': 'f'}      1
2019-06-19 16:00:00                                                                     0
df = df.astype(str)
df['gateWayReqInTime_Date'] = df['gateWayReqInTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df.drop('gateWayReqInTime', inplace=True, axis=1)
df['gateWayReqInTime_Date'] = pd.to_datetime(df['gateWayReqInTime_Date'])
df2 = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H')).agg(', '.join)
df2['count'] = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H'))['_id'].agg('count')

print(df2.head())

                        _id   reqId statusCode requestUrl                 payload  count
gateWayReqInTime_Date                                       
2019-06-19 12:00:00       0      R1        401         /a               {'a': 'b'}      1
2019-06-19 13:00:00    4, 1  R5, R2   201, 206     /e, /b   {'i': 'j'}, {'c': 'd'}      2
2019-06-19 14:00:00                                                                     0
2019-06-19 15:00:00       2      R3        200         /c               {'e': 'f'}      1
2019-06-19 16:00:00                                                                     0