Python 在多个可能的POS标记时查找word表单的总计数
我觉得我有一个愚蠢的问题,但不管怎样。。 我试图从看起来像这样的数据出发:Python 在多个可能的POS标记时查找word表单的总计数,python,python-3.x,nlp,linguistics,Python,Python 3.x,Nlp,Linguistics,我觉得我有一个愚蠢的问题,但不管怎样。。 我试图从看起来像这样的数据出发: a word form lemma POS count of occurrance same word form lemma Not the same POS another count same word form lemma Yet another POS another count the word form total count
a word form lemma POS count of occurrance
same word form lemma Not the same POS another count
same word form lemma Yet another POS another count
the word form total count all possible POS and their individual counts
for row in all_rows:
if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count
结果如下所示:
a word form lemma POS count of occurrance
same word form lemma Not the same POS another count
same word form lemma Yet another POS another count
the word form total count all possible POS and their individual counts
for row in all_rows:
if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count
例如,我可以:
ring total count = 100 noun = 40, verb = 60
我的数据保存在CSV文件中。我想这样做:
a word form lemma POS count of occurrance
same word form lemma Not the same POS another count
same word form lemma Yet another POS another count
the word form total count all possible POS and their individual counts
for row in all_rows:
if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count
但是我似乎不知道怎么做。帮忙 如果我理解正确,实现您需要的最简单方法是:
# Mocked CSV data
data = [
['a', 'lemma', 'pos', 1],
['a', 'lemma', 'pos1', 2],
['a', 'lemma', 'pos2', 3],
['b', 'lemma', 'pos', 5],
]
result = {}
for row in data:
key = row[0]
count = row[3]
if key in result:
result[key] += count
else:
result[key] = count
print(result)
结果:
{
'a': 6,
'b': 5
}
你的意思是不是
如果第[0]列与下一行的第[0]列相同…
?嗯。我的想法是逐行进行,因为我的数据中有多个单词,我想保持看起来相同但有不同词性标签的单词形式的总数(敲钟、戴戒指)。因此,如果第1行的第0个元素(即单词形式)与第2行的第0个元素相同,则将来自这些行的第3个元素的值相加,以获得单词形式的总计数…是的,技术上是列。您是对的,实际上第[0]行是列,所以我不知道我为什么问这个问题,但一开始我感到困惑。谢谢