Python 使用df、groupby时组内记录数不正确

Python 使用df、groupby时组内记录数不正确,python,pandas,Python,Pandas,我从中获得了以下修改代码,用于根据时间戳将行拆分为5秒组 df = pd.read_csv(file_name, delimiter=',') df['dt'] = pd.to_datetime(df['datetime'], unit='s') for g in df.groupby(pd.Grouper(freq='5s', key='dt')): print(f'Start time {g[0]} has {len(g)} records within 5 secs') 但我在组

我从中获得了以下修改代码,用于根据时间戳将行拆分为5秒组

df = pd.read_csv(file_name, delimiter=',')
df['dt'] = pd.to_datetime(df['datetime'], unit='s')
for g in df.groupby(pd.Grouper(freq='5s', key='dt')):
    print(f'Start time {g[0]} has {len(g)} records within 5 secs')
但我在组中得到的记录数量不正确

输出

Start time 2017-05-02 16:00:45 has 2 records within 5 secs
...
示例CSV如下所示

datetime,x,y,z,label
1493740845,0.0004,-0.0001,0.0045,bad
1493740846,0.0004,0.0006,0.0049,bad
1493740847,0.0002,0.0013,0.0044,bad
1493740848,0.0002,0.0005,0.0046,bad
1493740849,0.0006,0.0006,0.0038,bad
1493740850,0.0009,0.0002,0.0038,bad
...

有两个值的
g
元组,所以总是得到
2

我认为您可以将元组解压为
name
g
变量,然后按照需要工作:

for name, g in df.groupby(pd.Grouper(freq='5s', key='dt')):
    print(f'Start time {name} has {len(g)} records within 5 secs')

Start time 2017-05-02 16:00:45 has 5 records within 5 secs
Start time 2017-05-02 16:00:50 has 1 records within 5 secs
在您的解决方案中,将
g[1]
用于
length
s:

for g in df.groupby(pd.Grouper(freq='5s', key='dt')):
    print(f'Start time {g[0]} has {len(g[1])} records within 5 secs')

@耶斯雷尔一世在汉克斯身上加了很多东西,效果很好。我喜欢你的个人资料哲学,关于否决投票的推理。我和你一样same@user158-嗯,事实上应该更好,但现在我认为反对票越少越好,就像我意识到的那样;)它应该小心使用,主要是用于答案。我想你是说使用新的API的新答案?如果是这样的话,可以通过按活动对答案进行排序来解决。耶,重复的答案很烦人,原因是人们试图利用答案排序设置。如果读卡器答案排序设置设置为“活动”,则最新答案将位于顶部,这将导致新读卡器更多的向上投票。@user158-当然,但我这里没有魔力。对于我来说,工作编码,许多行代码都是从其他人那里学习的。所以对我来说这真的是一个漫长的过程。但如果你有
R
或matlab或类似软件的经验,你就可以理解数据处理的原理,这样在so中搜索就更容易了。有很多问题都解决了,只是发现不是那么容易。