Python 蜿蜒数据聚合

Python 蜿蜒数据聚合,python,pandas,Python,Pandas,我正在使用pandas获取用户活动的统计信息: import pandas as pd dataset = pd.read_csv('/content/drive/My Drive/Colab Notebooks/data.csv') # data is a table with columns: src_id, time, requests # time - represents time slots(30 s), when client was act

我正在使用pandas获取用户活动的统计信息:

    import pandas as pd
    dataset = pd.read_csv('/content/drive/My Drive/Colab Notebooks/data.csv')
    # data is a table with columns: src_id, time, requests    
    # time - represents time slots(30 s), when client was active 

    g = dataset.groupby("src_ip")
    clients_statistic = pd.DataFrame(columns=["requests_count", "max_requests_in_30s", "time_slots_count"])

    clients_statistic.active_time_slots_count = g["time"].count()
    clients_statistic.requests_count = g["requests"].sum()
    clients_statistic.max_requests_in_30s = g["requests"].max()

我已经有了最大活动的价值。现在我需要得到用户活动最大的时刻。我可以通过迭代得到它。我不认为迭代是一个好主意。

为此,可以使用简单的条件

例如,对于某些src\u用户ip的特定用户,您具有值max\u value

要查找时间,当此用户有最大活动时,只需查找:

data[(data['src_id'] == src_user_ip) && (data['requests'] == max_value)]
方括号内的条件将生成真/假掩码,用于获取您想要查找的信息

你可以阅读更多关于熊猫的情况


这并不像我最初想的那么难

g = dataset.sort_values(by="requests", ascending=False).groupby("src_ip")
clients_statistic.time_slots_count = g["time"].count()
clients_statistic.requests_count = g["requests"].sum()
clients_statistic.max_requests_in_30s = g["requests"].first()
clients_statistic.max_requests_moment = g["time"].first()

查找
maxarg
您能将您的解决方案与我的相比吗?@paladovalex是的,它可能也会起作用。唯一的问题是不需要排序。在这样的条件下,你也可以写很多东西。