Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何按时间间隔在5分钟内创建窗口,以便使用Python3计算单词的重复次数_Python_Python 3.x - Fatal编程技术网

如何按时间间隔在5分钟内创建窗口,以便使用Python3计算单词的重复次数

如何按时间间隔在5分钟内创建窗口,以便使用Python3计算单词的重复次数,python,python-3.x,Python,Python 3.x,我有一个CSV文件,它有两列:毫秒和主题。我的CSV文件如下所示: milliseconds, topics 1.4998308E+12,today is warm 1.4998309E+12,today is warm 1.4998310E+12,today is warm 1.4998314E+12,today is cold 1.4998315E+12,today is cold 1.4998317E+12,today is cold 1.4998318E+12,today

我有一个CSV文件,它有两列:
毫秒
主题
。我的CSV文件如下所示:

 milliseconds, topics
 1.4998308E+12,today is warm
 1.4998309E+12,today is warm
 1.4998310E+12,today is warm
 1.4998314E+12,today is cold
 1.4998315E+12,today is cold
 1.4998317E+12,today is cold
 1.4998318E+12,today is cold
 1.4998320E+12,today is cold
 1.4998322E+12,today is cold
 1.4998323E+12,today is cold
 1.4998324E+12,today is cold
 1.4998326E+12,today is warm
 1.4998328E+12,today is warm
 1.4998331E+12,today is cold
 1.4998333E+12,today is warm
 1.4998336E+12,today is warm
 1.4998336E+12,today is warm
 1.4998337E+12,today is warm
 1.4998338E+12,today is snow
 1.4998339E+12,today is snow
 1.4998340E+12,today is snow
 1.4998341E+12,today is snow
 1.4998342E+12,today is warm
 1.4998343E+12,today is warm
如何在每个窗口包含5分钟的窗口中计算单词。时间从2017年12月7日6:40:00至2017年12月7日7:38:20

 window(1) start from 6:40:00 to 6:44:00
 window(2) start from 6:45:00 to 6:49:00
 window(3) start from 6:49:00 to 6:53:00
 window(4) start from 6:54:00 to 6:58:00 
 window(5) start from 6:59:00 to 7:03:00 
 window(6) start from 7:04:00 to 7:08:00 
 etc
我想使用Python3在5分钟的时间间隔内计算雪、
warm
cold
的发生率。结果如下:

 warm 3  0   0   0   0   0   2   0   1   3   0   2 total 11 
 cold 0  0   2   2   2   2   0   1   0   0   0   0 total 09
 snow 0  0   0   0   0   0   0   0   0   0   3   1 total 4
其中窗口(1)重复
warm
3次,重复
cold
0次,重复
snow
0次
等等。

熊猫群比是你需要的

import pandas as pd
df = pd.read_csv(<filename>)
然后我们每5分钟分组讨论一次主题

counts = topics.groupby([pd.Grouper(level='milliseconds', freq='5min'), 'topic']).count()

milliseconds    topic   count
2017-07-12 03:40:00 warm    3
2017-07-12 03:50:00 cold    2
2017-07-12 03:55:00 cold    2
2017-07-12 04:00:00 cold    2
2017-07-12 04:05:00 cold    2
2017-07-12 04:10:00 warm    2
2017-07-12 04:15:00 cold    1
2017-07-12 04:20:00 warm    1
2017-07-12 04:25:00 warm    3
2017-07-12 04:30:00 snow    3
2017-07-12 04:35:00 snow    1
2017-07-12 04:35:00 warm    2
如果需要,您可以使用
unstack

results = counts.unstack('milliseconds').fillna(0).astype(int)
results.columns = range(len(results.columns))
results['total'] = results.sum(axis=1)

你能告诉我们你做了什么吗?我发现了这个错误:文件“pandas_libs\hashtable\u class_helper.pxi”,第1218行,在pandas._libs.hashtable.PyObjectHashTable.get_item KeyError:“topics”这是因为你的csv在标题中有空格。重命名列或更改键现在我在results=counts.unstack('ms')的第10行找到了这个文件“C:/Users/admin/readFile/window.py”。fillna(0)AttributeError:'int'对象没有属性“unstack”,非常感谢,Maarten。
milliseconds    topic   count
2017-07-12 03:40:00 warm    1
2017-07-12 03:41:40 warm    1
2017-07-12 03:43:20 warm    1
2017-07-12 03:50:00 cold    1
2017-07-12 03:51:40 cold    1
2017-07-12 03:55:00 cold    1
2017-07-12 03:56:40 cold    1
2017-07-12 04:00:00 cold    1
2017-07-12 04:03:20 cold    1
2017-07-12 04:05:00 cold    1
2017-07-12 04:06:40 cold    1
2017-07-12 04:10:00 warm    1
2017-07-12 04:13:20 warm    1
2017-07-12 04:18:20 cold    1
2017-07-12 04:21:40 warm    1
2017-07-12 04:26:40 warm    1
2017-07-12 04:26:40 warm    1
2017-07-12 04:28:20 warm    1
2017-07-12 04:30:00 snow    1
2017-07-12 04:31:40 snow    1
2017-07-12 04:33:20 snow    1
2017-07-12 04:35:00 snow    1
2017-07-12 04:36:40 warm    1
2017-07-12 04:38:20 warm    1
counts = topics.groupby([pd.Grouper(level='milliseconds', freq='5min'), 'topic']).count()

milliseconds    topic   count
2017-07-12 03:40:00 warm    3
2017-07-12 03:50:00 cold    2
2017-07-12 03:55:00 cold    2
2017-07-12 04:00:00 cold    2
2017-07-12 04:05:00 cold    2
2017-07-12 04:10:00 warm    2
2017-07-12 04:15:00 cold    1
2017-07-12 04:20:00 warm    1
2017-07-12 04:25:00 warm    3
2017-07-12 04:30:00 snow    3
2017-07-12 04:35:00 snow    1
2017-07-12 04:35:00 warm    2
results = counts.unstack('milliseconds').fillna(0).astype(int)
results.columns = range(len(results.columns))
results['total'] = results.sum(axis=1)
print(results)
topic   0   1   2   3   4   5   6   7   8   9   10  total
cold    0   2   2   2   2   0   1   0   0   0   0   9
snow    0   0   0   0   0   0   0   0   0   3   1   4
warm    3   0   0   0   0   2   0   1   3   0   2   11