Python 每小时/每天重新采样时间戳，并合并其他相应行的值_Python_Pandas_Dataframe

Python 每小时/每天重新采样时间戳，并合并其他相应行的值

python pandas dataframe

Python 每小时/每天重新采样时间戳，并合并其他相应行的值,python,pandas,dataframe,Python,Pandas,Dataframe,希望每小时/每天/每周/每月/每年对时间段重新采样，并合并相应行中的值数据帧（26列）所需输出 gatewayReqInTime_Date count _id reqId statusCode requestUrl ....... payload 2019-06-19 13:00:00 1 [0] [R1] [401] ["/a"] . 2019-06-19 14:00:00

希望每小时/每天/每周/每月/每年对时间段重新采样，并合并相应行中的值

数据帧（26列）

所需输出

gatewayReqInTime_Date count _id     reqId      statusCode   requestUrl ....... payload
2019-06-19 13:00:00    1    [0]      [R1]      [401]        ["/a"]                .
2019-06-19 14:00:00    2    [1,4]    [R2, R5]  [206, 201]   ["/b", "/e"]          .
2019-06-19 16:00:00    1    [2]      [R3]      [200]        ["/c"]                .
2019-12-03 15:00:00    1    [3]      [R4]      [200]        ["/d"]                .

我可以将时间戳转换为日期时间，重新采样并获得计数（输出的前2列）

然而，我在组合这些值时遇到了麻烦。试过的groupby、agg等

df['gateWayReqinTime_Date'] = df['gateWayReqinTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df2 = (df.groupby(pd.Grouper(key='gateWayReqinTime_Date', freq='H'))
        .size()
        .reset_index(name='Count'))
df2

PS：这是Javascript开发人员使用Python的第一周。请建议将列标签视为动态的或非特定的方法。以下方法接近您的要求：

df = df.astype(str)
df['gateWayReqInTime_Date'] = df['gateWayReqInTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df.drop('gateWayReqInTime', inplace=True, axis=1)
df['gateWayReqInTime_Date'] = pd.to_datetime(df['gateWayReqInTime_Date'])
df2 = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H')).agg(', '.join)
df2['count'] = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H'))['_id'].agg('count')

print(df2.head())

                        _id   reqId statusCode requestUrl                 payload  count
gateWayReqInTime_Date                                       
2019-06-19 12:00:00       0      R1        401         /a               {'a': 'b'}      1
2019-06-19 13:00:00    4, 1  R5, R2   201, 206     /e, /b   {'i': 'j'}, {'c': 'd'}      2
2019-06-19 14:00:00                                                                     0
2019-06-19 15:00:00       2      R3        200         /c               {'e': 'f'}      1
2019-06-19 16:00:00                                                                     0

df = df.astype(str)
df['gateWayReqInTime_Date'] = df['gateWayReqInTime'].apply(lambda d: datetime.datetime.fromtimestamp(int(d)/1000).strftime('%Y-%m-%d %H:%M:%S'))
df.drop('gateWayReqInTime', inplace=True, axis=1)
df['gateWayReqInTime_Date'] = pd.to_datetime(df['gateWayReqInTime_Date'])
df2 = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H')).agg(', '.join)
df2['count'] = df.groupby(pd.Grouper(key='gateWayReqInTime_Date', freq='H'))['_id'].agg('count')

print(df2.head())

                        _id   reqId statusCode requestUrl                 payload  count
gateWayReqInTime_Date                                       
2019-06-19 12:00:00       0      R1        401         /a               {'a': 'b'}      1
2019-06-19 13:00:00    4, 1  R5, R2   201, 206     /e, /b   {'i': 'j'}, {'c': 'd'}      2
2019-06-19 14:00:00                                                                     0
2019-06-19 15:00:00       2      R3        200         /c               {'e': 'f'}      1
2019-06-19 16:00:00                                                                     0