Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 数据归并_Python_Pandas_Reduction - Fatal编程技术网

Python 数据归并

Python 数据归并,python,pandas,reduction,Python,Pandas,Reduction,我正在使用一个熊猫(版本0.17.1)数据帧,看起来如下: time type module msg_type content 36636 2016-08-25 17:59:50.051 INFO MOD_1_NAME STATUS Received Status Monitoring from MODULE_1 'Property A' = some_value_1 36637 2016-08-25 1

我正在使用一个熊猫(版本0.17.1)数据帧,看起来如下:

                         time   type   module     msg_type         content
36636 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property A' = some_value_1
36637 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property B' = some_value_2
36638 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property C' = some_value_3
36639 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property D' = some_value_4
36715 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 1' = some_value_a
36716 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 2' = some_value_b
36717 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 3' = some_value_c
36718 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 4' = some_value_d
36719 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 5' = some_value_e
36720 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 6' = some_value_f
36721 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 7' = some_value_g
36722 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 8' = some_value_h
36723 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 9' = some_value_i
36724 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 10' = some_value_j
36725 2016-08-25 17:59:50.964  ERROR   MOD_2_NAME  STATUS  Didn't receive Status Monitoring 'Parameter 11' from MODULE_2!
36726 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 12' = some_value_k
36727 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 13' = some_value_l
36785 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property A' = some_value_1
36786 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property B' = some_value_2
36787 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property C' = some_value_3
36788 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property D' = some_value_4
36827 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 1' = some_value_a
36828 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 2' = some_value_b
36829 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 3' = some_value_c
36830 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 4' = some_value_d
36831 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 5' = some_value_e
36832 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 6' = some_value_f
36833 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 7' = some_value_g
36834 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 8' = some_value_h
36835 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 9' = some_value_i
36836 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 10' = some_value_j
36837 2016-08-25 19:01:50.964  ERROR   MOD_2_NAME  STATUS  Didn't receive Status Monitoring 'Parameter 11' from MODULE_2!
36838 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 12' = some_value_k
36839 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 13' = some_value_l
                         time   type   module     msg_type         content
36636 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  {'Property A' = 'some_value_1', 'Property B' = 'some_value_2', 'Property C' = 'some_value_3', 'Property D' = 'some_value_4'}
36715 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  {'Parameter 1' = 'some_value_a', 'Parameter 2' = 'some_value_b', 'Parameter 3' = 'some_value_c', 'Parameter 4' = 'some_value_d', 'Parameter 5' = 'some_value_e', 'Parameter 6' = 'some_value_f', 'Parameter 7' = 'some_value_g','Parameter 8' = some_value_h, 'Parameter 9' = 'some_value_i', 'Parameter 10' = 'some_value_j', 'Parameter 11' = '', 'Parameter 12' = 'some_value_k', 'Parameter 13' = 'some_value_l'}
36785 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  {'Property A' = 'some_value_1', 'Property B' = 'some_value_2', 'Property C' = 'some_value_3', 'Property D' = 'some_value_4'}
36827 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  {'Parameter 1' = 'some_value_a', 'Parameter 2' = 'some_value_b', 'Parameter 3' = 'some_value_c', 'Parameter 4' = 'some_value_d', 'Parameter 5' = 'some_value_e', 'Parameter 6' = 'some_value_f', 'Parameter 7' = 'some_value_g','Parameter 8' = some_value_h, 'Parameter 9' = 'some_value_i', 'Parameter 10' = 'some_value_j', 'Parameter 11' = '', 'Parameter 12' = 'some_value_k', 'Parameter 13' = 'some_value_l'}
(框架已经缩小,以删除不感兴趣的行。这就是索引列缺少数字的原因)

如您所见,同时从设备读取多个参数。每个读数都是一行。我想做一些“缩小”和“压缩”,这样每个读数就只有一行了。我还希望
content
列成为一个字典,这样我就可以轻松地查找感兴趣的特定项目。因此,结果如下所示:

                         time   type   module     msg_type         content
36636 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property A' = some_value_1
36637 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property B' = some_value_2
36638 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property C' = some_value_3
36639 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property D' = some_value_4
36715 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 1' = some_value_a
36716 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 2' = some_value_b
36717 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 3' = some_value_c
36718 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 4' = some_value_d
36719 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 5' = some_value_e
36720 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 6' = some_value_f
36721 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 7' = some_value_g
36722 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 8' = some_value_h
36723 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 9' = some_value_i
36724 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 10' = some_value_j
36725 2016-08-25 17:59:50.964  ERROR   MOD_2_NAME  STATUS  Didn't receive Status Monitoring 'Parameter 11' from MODULE_2!
36726 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 12' = some_value_k
36727 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 13' = some_value_l
36785 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property A' = some_value_1
36786 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property B' = some_value_2
36787 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property C' = some_value_3
36788 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  Received Status Monitoring from MODULE_1 'Property D' = some_value_4
36827 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 1' = some_value_a
36828 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 2' = some_value_b
36829 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 3' = some_value_c
36830 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 4' = some_value_d
36831 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 5' = some_value_e
36832 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 6' = some_value_f
36833 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 7' = some_value_g
36834 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 8' = some_value_h
36835 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 9' = some_value_i
36836 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 10' = some_value_j
36837 2016-08-25 19:01:50.964  ERROR   MOD_2_NAME  STATUS  Didn't receive Status Monitoring 'Parameter 11' from MODULE_2!
36838 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 12' = some_value_k
36839 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  Received Status Monitoring from MODULE_2 'Parameter 13' = some_value_l
                         time   type   module     msg_type         content
36636 2016-08-25 17:59:50.051   INFO  MOD_1_NAME  STATUS  {'Property A' = 'some_value_1', 'Property B' = 'some_value_2', 'Property C' = 'some_value_3', 'Property D' = 'some_value_4'}
36715 2016-08-25 17:59:50.964   INFO   MOD_2_NAME  STATUS  {'Parameter 1' = 'some_value_a', 'Parameter 2' = 'some_value_b', 'Parameter 3' = 'some_value_c', 'Parameter 4' = 'some_value_d', 'Parameter 5' = 'some_value_e', 'Parameter 6' = 'some_value_f', 'Parameter 7' = 'some_value_g','Parameter 8' = some_value_h, 'Parameter 9' = 'some_value_i', 'Parameter 10' = 'some_value_j', 'Parameter 11' = '', 'Parameter 12' = 'some_value_k', 'Parameter 13' = 'some_value_l'}
36785 2016-08-25 18:59:50.051   INFO  MOD_1_NAME  STATUS  {'Property A' = 'some_value_1', 'Property B' = 'some_value_2', 'Property C' = 'some_value_3', 'Property D' = 'some_value_4'}
36827 2016-08-25 19:01:50.964   INFO   MOD_2_NAME  STATUS  {'Parameter 1' = 'some_value_a', 'Parameter 2' = 'some_value_b', 'Parameter 3' = 'some_value_c', 'Parameter 4' = 'some_value_d', 'Parameter 5' = 'some_value_e', 'Parameter 6' = 'some_value_f', 'Parameter 7' = 'some_value_g','Parameter 8' = some_value_h, 'Parameter 9' = 'some_value_i', 'Parameter 10' = 'some_value_j', 'Parameter 11' = '', 'Parameter 12' = 'some_value_k', 'Parameter 13' = 'some_value_l'}
因此,基本上我希望所有具有相同值的
time
module
列的行被“合并”在一起,它们的
内容
列被解析到字典中。(也可能有一些“缺失”或“空”读数。)我不想过滤或删除数据,只想减少和汇总数据

我猜我需要向您介绍一些
groupby()
transform()
、和
apply()
的组合,但我甚至不知道从哪里开始

我的部分困难在于我无法检查
groupby()
的结果,看它是否在做我想做的事情

g1 = df.groupby(['module', 'time'])
g1
不会显示在Spyder变量浏览器中<代码>打印ing不显示任何内容。我无法访问属性
索引
或调用
g1上的
info()
。但我怀疑
groupby()
在这里是否值得。。。我不想消除任何东西

一直在做一些搜索以找到一个例子,但不断得到似乎是误报的结果。如有任何帮助,我们将不胜感激。

定义一个函数并使用它,然后:

定义函数并使用,然后:


为了了解熊猫的群体,你应该去看看。了解群组的另一种方法是简单地打印它们:

grouped=df.groupby(['A','B'])
打印分组。第一组()#打印第一组
#打印分组中的每个(名称、组)元组
对于名称,分组中的grp:
印刷品名称
印刷玻璃钢
基于我的一些假设,我已经为您制定了一个具体的解决方案(见下面的注释):

重新导入
从集合导入订单
df=pd.read_csv('/Users/shawnheide/Desktop/test.csv')
def自定义_agg(内容):
此命令=OrderedDict()
对于目录中的内容:
match=re.findall(“属性\w+|参数\d+”,内容)
如果匹配:
键=匹配[0]
match=re.findall(“某些值\uw+;某些值\ud+”,内容)
如果匹配:
值=匹配[0]
其他:
值=“”
此dict[键]=值
把这张纸还给我
grps=df.groupby(['time','module',],as_index=False)
df_grp=grps.agg({'content':custom_agg})
输出:

time    module  content
0   2016-08-25 17:59:50.051 MOD_1_NAME  {'Property A': 'some_value_1', 'Property B': 'some_value_2', 'Property C': 'some_value_3', 'Property D': 'some_value_4'}
1   2016-08-25 17:59:50.964 MOD_2_NAME  {'Parameter 1': 'some_value_a', 'Parameter 2': 'some_value_b', 'Parameter 3': 'some_value_c', 'Parameter 4': 'some_value_d', 'Parameter 5': 'some_value_e', 'Parameter 6': 'some_value_f', 'Parameter 7': 'some_value_g', 'Parameter 8': 'some_value_h', 'Parameter 9': 'some_value_i', 'Parameter 10': 'some_value_j', 'Parameter 11': '', 'Parameter 12': 'some_value_k', 'Parameter 13': 'some_value_l'}
2   2016-08-25 18:59:50.051 MOD_1_NAME  {'Property A': 'some_value_1', 'Property B': 'some_value_2', 'Property C': 'some_value_3', 'Property D': 'some_value_4'}
3   2016-08-25 19:01:50.964 MOD_2_NAME  {'Parameter 1': 'some_value_a', 'Parameter 2': 'some_value_b', 'Parameter 3': 'some_value_c', 'Parameter 4': 'some_value_d', 'Parameter 5': 'some_value_e', 'Parameter 6': 'some_value_f', 'Parameter 7': 'some_value_g', 'Parameter 8': 'some_value_h', 'Parameter 9': 'some_value_i', 'Parameter 10': 'some_value_j', 'Parameter 11': '', 'Parameter 12': 'some_value_k', 'Parameter 13': 'some_value_l'}

要审议的问题:

因此,首先,您应该以其他人可以读取的格式发布数据(即csv、tsv等),这使其他人更容易导入并帮助您解决问题

第二个问题是,在建议的解决方案中,您有index和msg_类型的列。考虑到你不在这些列上进行分组,这并没有什么意义,但实际上这只是一个值得考虑的问题。p>
最后,为了获得一个有序的字典,您需要从集合中使用OrderedDict模块,因为Python dict不维护顺序(祈求好运,这项功能将在3.6中提供)。

为了理解pandas中的组,您应该查看。了解群组的另一种方法是简单地打印它们:

grouped=df.groupby(['A','B'])
打印分组。第一组()#打印第一组
#打印分组中的每个(名称、组)元组
对于名称,分组中的grp:
印刷品名称
打印grp
基于我的一些假设,我已经为您制定了一个具体的解决方案(见下面的注释):

重新导入
从集合导入订单
df=pd.read_csv('/Users/shawnheide/Desktop/test.csv')
def自定义_agg(内容):
此命令=OrderedDict()
对于目录中的内容:
match=re.findall(“属性\w+|参数\d+”,内容)
如果匹配:
键=匹配[0]
match=re.findall(“某些值\uw+;某些值\ud+”,内容)
如果匹配:
值=匹配[0]
其他:
值=“”
此dict[键]=值
把这张纸还给我
grps=df.groupby(['time','module',],as_index=False)
df_grp=grps.agg({'content':custom_agg})
输出:

time    module  content
0   2016-08-25 17:59:50.051 MOD_1_NAME  {'Property A': 'some_value_1', 'Property B': 'some_value_2', 'Property C': 'some_value_3', 'Property D': 'some_value_4'}
1   2016-08-25 17:59:50.964 MOD_2_NAME  {'Parameter 1': 'some_value_a', 'Parameter 2': 'some_value_b', 'Parameter 3': 'some_value_c', 'Parameter 4': 'some_value_d', 'Parameter 5': 'some_value_e', 'Parameter 6': 'some_value_f', 'Parameter 7': 'some_value_g', 'Parameter 8': 'some_value_h', 'Parameter 9': 'some_value_i', 'Parameter 10': 'some_value_j', 'Parameter 11': '', 'Parameter 12': 'some_value_k', 'Parameter 13': 'some_value_l'}
2   2016-08-25 18:59:50.051 MOD_1_NAME  {'Property A': 'some_value_1', 'Property B': 'some_value_2', 'Property C': 'some_value_3', 'Property D': 'some_value_4'}
3   2016-08-25 19:01:50.964 MOD_2_NAME  {'Parameter 1': 'some_value_a', 'Parameter 2': 'some_value_b', 'Parameter 3': 'some_value_c', 'Parameter 4': 'some_value_d', 'Parameter 5': 'some_value_e', 'Parameter 6': 'some_value_f', 'Parameter 7': 'some_value_g', 'Parameter 8': 'some_value_h', 'Parameter 9': 'some_value_i', 'Parameter 10': 'some_value_j', 'Parameter 11': '', 'Parameter 12': 'some_value_k', 'Parameter 13': 'some_value_l'}

要审议的问题:

因此,首先,您应该以其他人可以读取的格式发布数据(即csv、tsv等),这使其他人更容易导入并帮助您解决问题

第二个问题是,在建议的解决方案中,您有index和msg_类型的列。考虑到你不在这些列上进行分组,这并没有什么意义,但实际上这只是一个值得考虑的问题。p> 最后,为了获得一个有序的字典,您需要从集合中使用OrderedDict模块,因为Python dict不维护顺序(祈求好运,这项功能将在3.6中推出)