Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/matlab/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 矩阵创建_Python_Matlab_Matrix - Fatal编程技术网

Python 矩阵创建

Python 矩阵创建,python,matlab,matrix,Python,Matlab,Matrix,我有许多包含字符串的文档,如下所示 [('ADVP', 'RB'), ('NP', 'NN'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'NN'), ('PP', 'TO'), ('NP', 'PRP'), ('NP', 'RB'), ('NP', 'CD'), ('NP', 'JJ'), ('NP', 'NN'), (' PP', 'IN'), ('NP', 'NNS'), ('ADVP', 'RB'), ('NP', 'PRP'), ('PP', 'IN'

我有许多包含字符串的文档,如下所示

[('ADVP', 'RB'), ('NP', 'NN'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'NN'),   ('PP',
'TO'), ('NP', 'PRP'), ('NP', 'RB'), ('NP', 'CD'), ('NP', 'JJ'), ('NP', 'NN'), ('
PP', 'IN'), ('NP', 'NNS'), ('ADVP', 'RB'), ('NP', 'PRP'), ('PP', 'IN'), ('NP', '
DT'), ('NP', 'NN'), ('NP', 'NN'), ('NP', 'DT'), ('NP', 'NN'), ('ADVP', 'RB'), ('
NP', 'DT'), ('NP', 'JJ'), ('NP', 'NN'), ('WHNP', 'WDT'), ('NP', 'JJS'), ('NP', '
CD'), ('NP', 'PRP'), ('VP', 'VBP'), ('NP', 'DT'), ('NP', 'NNS'), ('NP', 'PRP'),
('VP', 'VBD'), ('NP', 'DT'), ('NP', 'NN'), ('WHADVP', 'WRB'), ('NP', 'DT'), ('NP
', 'NNS'), ('NP', 'RB'), ('NP', 'DT'), ('NP', 'NNS'), ('PRT', 'RP'), ('NP', 'PRP
'), ('ADVP', 'RB'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'PRP'), ('PP', 'IN'), ('N
P', 'NN'), ('PP', 'IN'), ('NP', 'NN'), ('PP', 'IN'), ('NP', 'NN')]
我想在excel中创建一个矩阵,其中每个独特的语法类别对(如('ADVP','RB'),('NP','NN'),('NP','DT')以各自的频率作为列标题

第二,第三个文档可能包含文档一中没有的语法类别对。因此,不存在的语法对必须追加到列标题中


最后,我想创建一个矩阵,其中列指定语法对,行指定不同的文档。矩阵中的每个条目Mij应表明第j个语法对在第i个文档中出现的频率。

您可以使用
集合
模块来计算语法对的频率

import collections
doc1 = [('ADVP', 'RB'), ('NP', 'NN'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'NN'),   ('PP','TO'), ('NP', 'PRP'), ('NP', 'RB'), ('NP', 'CD'), ('NP', 'JJ'), ('NP', 'NN'), ('PP', 'IN'), ('NP', 'NNS'), ('ADVP', 'RB'), ('NP', 'PRP'), ('PP', 'IN'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'NN'), ('NP', 'DT'), ('NP', 'NN'), ('ADVP', 'RB'), ('NP', 'DT'), ('NP', 'JJ'), ('NP', 'NN'), ('WHNP', 'WDT'), ('NP', 'JJS'), ('NP', 'CD'), ('NP', 'PRP'), ('VP', 'VBP'), ('NP', 'DT'), ('NP', 'NNS'), ('NP', 'PRP'),('VP', 'VBD'), ('NP', 'DT'), ('NP', 'NN'), ('WHADVP', 'WRB'), ('NP', 'DT'), ('NP', 'NNS'), ('NP', 'RB'), ('NP', 'DT'), ('NP', 'NNS'), ('PRT', 'RP'), ('NP', 'PRP'), ('ADVP', 'RB'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'PRP'), ('PP', 'IN'), ('NP', 'NN'), ('PP', 'IN'), ('NP', 'NN'), ('PP', 'IN'), ('NP', 'NN')]
count1 = collections.Counter(doc1)
这给你

count1.keys()
>>>[('PP', 'IN'), ('WHADVP', 'WRB'), ('NP', 'NNS'), ('WHNP', 'WDT'), ('NP', 'NN'), ('NP', 'JJS'), ('NP', 'DT'), ('NP', 'CD'), ('ADVP', 'RB'), ('PRT', 'RP'), ('VP', 'VBD'), ('NP', 'JJ'), ('NP', 'RB'), ('VP', 'VBP'), ('NP', 'PRP'), ('PP', 'TO')]

count1.values()
>>>[5, 1, 4, 1, 13, 1, 9, 2, 4, 1, 1, 2, 2, 1, 6, 1]
对每个文档执行此操作

之后,您需要将这些值转换为具有树值的列表。 在这种情况下,numpy数组更容易处理

import numpy as np

for key in pairs1.key()
     pairs1[key] = np.array([pairs1[key],0,0])

for key in pairs2.key()
     pairs2[key] = np.array([0,pairs2[key],0])

for key in pairs3.key()
     pairs3[key] = np.array([0,0,pairs3[key]])
然后将所有三个词典合并在一起:

pairs = {}

for key in pairs1.keys():
    pairs[key] = pairs[key]

for key in pairs2.keys():
    try:
        pairs[key] = pairs[key] + pairs2[key]
    except KeyError:
        pairs[key] = pairs2[key]

for key in pairs3.keys():
    try:
        pairs[key] = pairs[key] + pairs3[key]
    except KeyError:
        pairs[key] = pairs3[key]
最后,你可以给出你的矩阵

f = open('myfile.csv','w')
header = ''
for key in pairs.keys():
    if header == '':
        header = '%s' %pairs[key]
    else:
        header = '%s, %s' % (header, pairs[key])
f.write('%s\n' % header)

for i in range(4):
    line = ''
    for value in pairs.values():
        if line == '':
            line = '%s' %pairs[value][i]
        else:
            header = '%s, %s' % (header, pairs[value][i])
    f.write('%s\n' % line)
f.close()

谢谢你的帮助。事实上,我有数千个这样的文档。是否有任何方法可以使数千个文档的此过程自动化,而不是手动输入它们。我们将围绕您在键/值对中读取的部分构建for循环。我的解决方案又快又脏。您应该能够使用较少的变量来编写此代码。(一个用于最终口述,另一个用于for循环期间读取的文件)