Python 在多个可能的POS标记时查找word表单的总计数_Python_Python 3.x_Nlp_Linguistics

Python 在多个可能的POS标记时查找word表单的总计数

python python-3.x nlp

Python 在多个可能的POS标记时查找word表单的总计数,python,python-3.x,nlp,linguistics,Python,Python 3.x,Nlp,Linguistics,我觉得我有一个愚蠢的问题，但不管怎样。。我试图从看起来像这样的数据出发： a word form lemma POS count of occurrance same word form lemma Not the same POS another count same word form lemma Yet another POS another count the word form total count

我觉得我有一个愚蠢的问题，但不管怎样。。我试图从看起来像这样的数据出发：

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count

the word form    total count    all possible POS and their individual counts

for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count

结果如下所示：

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count

the word form    total count    all possible POS and their individual counts

for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count

例如，我可以：

ring     total count = 100        noun = 40, verb = 60

我的数据保存在CSV文件中。我想这样做：

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count

the word form    total count    all possible POS and their individual counts

for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count

但是我似乎不知道怎么做。帮忙

如果我理解正确，实现您需要的最简单方法是：

# Mocked CSV data
data = [
 ['a', 'lemma', 'pos', 1],
 ['a', 'lemma', 'pos1', 2],
 ['a', 'lemma', 'pos2', 3],
 ['b', 'lemma', 'pos', 5],
]

result = {}

for row in data:
  key = row[0]
  count = row[3]
  if key in result:
    result[key] += count
  else:
    result[key] = count

print(result)

结果:

{
  'a': 6,
  'b': 5
}

你的意思是不是

如果第[0]列与下一行的第[0]列相同…

？嗯。我的想法是逐行进行，因为我的数据中有多个单词，我想保持看起来相同但有不同词性标签的单词形式的总数（敲钟、戴戒指）。因此，如果第1行的第0个元素（即单词形式）与第2行的第0个元素相同，则将来自这些行的第3个元素的值相加，以获得单词形式的总计数…是的，技术上是列。您是对的，实际上第[0]行是列，所以我不知道我为什么问这个问题，但一开始我感到困惑。谢谢