Python 3.x Pandas Dataframe:如何计算变量在1分钟内重复的次数
我有以下数据帧片段:Python 3.x Pandas Dataframe:如何计算变量在1分钟内重复的次数,python-3.x,pandas,datetime,dataframe,pandas-groupby,Python 3.x,Pandas,Datetime,Dataframe,Pandas Groupby,我有以下数据帧片段: Full dataframe: ip time cik crawler ts 2019-03-11 00:00:01 71.155.177.ide 00:00:01 1262327 0.0 2019-03-11 00:00:02 71.155.177.ide 00:00:
Full dataframe: ip time cik crawler
ts
2019-03-11 00:00:01 71.155.177.ide 00:00:01 1262327 0.0
2019-03-11 00:00:02 71.155.177.ide 00:00:02 1262329 0.0
2019-03-11 00:00:05 69.243.218.cah 00:00:05 751200 0.0
2019-03-11 00:00:08 172.173.121.efb 00:00:08 881890 0.0
2019-03-11 00:00:09 216.254.60.idd 00:00:09 1219169 0.0
2019-03-11 00:00:09 64.18.197.gjc 00:00:09 1261705 0.0
2019-03-11 00:00:09 64.18.197.gjc 00:00:09 1261734 0.0
2019-03-11 00:00:10 64.18.197.gjc 00:00:10 1263094 0.0
2019-03-11 00:00:10 64.18.197.gjc 00:00:10 1264242 0.0
2019-03-11 00:00:10 64.18.197.gjc 00:00:10 1264242 0.0
我想按IP分组,然后使用一些函数进行计数:
1) 一分钟内每个IP有多少个独特的CIK
2) 一分钟内每个IP总共有多少个CIK
我尝试过重采样功能,但我不知道如何让它以我想要的方式计数。
我的代码如下:
dataframe = pd.read_csv(path + "log20060702.csv", usecols=['cik', 'ip', 'time', 'crawler'])
dataframe = dataframe[dataframe['crawler'] == 0]
dataframe['cik'] = pd.to_numeric(dataframe['cik'], downcast='integer')
dataframe['ts'] = pd.to_datetime((dataframe['time']))
dataframe = dataframe.set_index(['ts'])
print("Full dataframe: ", dataframe.head(10))
df_dict = dataframe.groupby("ip")
counter = 0
for key, df_values in df_dict:
counter += 1
print("df values: ", df_values)
# df_values = df_values.resample("5T").count()
if counter == 5:
break
或者,如果有人能告诉我如何通过IP分组,每1分钟一次,其余的我自己做。我不是在寻找完整的解决方案,如果能提供一些指导,我将不胜感激 使用
groupby
和count by进行聚合:
或使用:
在这种情况下,
.resample('1Min')
是否会返回每分钟的大小?@Erfan-是的,没错。在这种情况下,您不必明确提到datetime
列。如果有多个datetime列,该如何工作@jezrael@Erfan-需要一个,最好由melt
df = dataframe.groupby("ip").resample('1Min')['cik'].agg(['nunique','size'])
print (df)
nunique size
ip ts
172.173.121.efb 2019-03-11 1 1
216.254.60.idd 2019-03-11 1 1
64.18.197.gjc 2019-03-11 4 5
69.243.218.cah 2019-03-11 1 1
71.155.177.ide 2019-03-11 2 2
df = dataframe.groupby(["ip", pd.Grouper(freq='1Min')])['cik'].agg(['nunique','size'])
print (df)
nunique size
ip ts
172.173.121.efb 2019-03-11 1 1
216.254.60.idd 2019-03-11 1 1
64.18.197.gjc 2019-03-11 4 5
69.243.218.cah 2019-03-11 1 1
71.155.177.ide 2019-03-11 2 2