Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在多个可能的POS标记时查找word表单的总计数_Python_Python 3.x_Nlp_Linguistics - Fatal编程技术网

Python 在多个可能的POS标记时查找word表单的总计数

Python 在多个可能的POS标记时查找word表单的总计数,python,python-3.x,nlp,linguistics,Python,Python 3.x,Nlp,Linguistics,我觉得我有一个愚蠢的问题,但不管怎样。。 我试图从看起来像这样的数据出发: a word form lemma POS count of occurrance same word form lemma Not the same POS another count same word form lemma Yet another POS another count the word form total count

我觉得我有一个愚蠢的问题,但不管怎样。。 我试图从看起来像这样的数据出发:

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count
the word form    total count    all possible POS and their individual counts 
for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count
结果如下所示:

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count
the word form    total count    all possible POS and their individual counts 
for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count
例如,我可以:

ring     total count = 100        noun = 40, verb = 60
我的数据保存在CSV文件中。我想这样做:

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count
the word form    total count    all possible POS and their individual counts 
for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count

但是我似乎不知道怎么做。帮忙

如果我理解正确,实现您需要的最简单方法是:

# Mocked CSV data
data = [
 ['a', 'lemma', 'pos', 1],
 ['a', 'lemma', 'pos1', 2],
 ['a', 'lemma', 'pos2', 3],
 ['b', 'lemma', 'pos', 5],
]

result = {}

for row in data:
  key = row[0]
  count = row[3]
  if key in result:
    result[key] += count
  else:
    result[key] = count

print(result)
结果:

{
  'a': 6,
  'b': 5
}

你的意思是不是
如果第[0]列与下一行的第[0]列相同…
?嗯。我的想法是逐行进行,因为我的数据中有多个单词,我想保持看起来相同但有不同词性标签的单词形式的总数(敲钟、戴戒指)。因此,如果第1行的第0个元素(即单词形式)与第2行的第0个元素相同,则将来自这些行的第3个元素的值相加,以获得单词形式的总计数…是的,技术上是列。您是对的,实际上第[0]行是列,所以我不知道我为什么问这个问题,但一开始我感到困惑。谢谢