Python 比较字符串并计算发生率_Python_String_List_Math

Python 比较字符串并计算发生率

python string list math

Python 比较字符串并计算发生率,python,string,list,math,Python,String,List,Math,我不知道如何解决这个问题我有3个列表，其中一个单词、一个标签和一个数字出现在文档上： v1 = [['be', 'VSIS3S0', 1], ['scott', 'NP00000', 2], ['north', 'NCMS000', 1], ['revolution', 'NP00000', 1], ['name', 'VMP00SM', 1]] v2 = [['mechanic', 'NCMS000', 1], ['be', 'VSIS3S0', 1], ['tool', 'AQ0CS0',

我不知道如何解决这个问题

我有3个列表，其中一个单词、一个标签和一个数字出现在文档上：

v1 = [['be', 'VSIS3S0', 1], ['scott', 'NP00000', 2], ['north', 'NCMS000', 1], ['revolution', 'NP00000', 1], ['name', 'VMP00SM', 1]]
v2 = [['mechanic', 'NCMS000', 1], ['be', 'VSIS3S0', 1], ['tool', 'AQ0CS0', 1], ['sam', 'NP00000', 1], ['frida', 'NP00000', 1]]
v3 = [['be', 'VSIP3S0', 1], ['scott', 'NP00000', 1], ['who', 'NP00000', 1]]

我如何构建一个函数来接收这些列表，比较每个单词，例如，在

v1

中的单词

be

在三个列表中出现一次，在这种情况下，附加到结果列表

（1*log（3/3））

，其中1->出现的最大值（是子列表的第三个元素），对数分子3->常数，对数分母3->因为单词出现在

v1

、

v2

和

v3

上

接下来我们将

scott

->在本例中，由于单词“scott”出现在

v1

和

v2

上，因此我们将其添加到结果列表

（2*log（3/2））

，2->出现的最大单词值，log分子3->常量，log分母2->

接下来，我们将

north

->在本例中，将其附加到结果列表

（1*log（3/1））

，1->最大单词出现值，log分子3->常量，log分母1->，因为单词“north”只出现

v1

接下来我们有

revolution

->在本例中，将其附加到结果列表

（1*log（3/1））

，1->最大单词出现值，log分子3->常量，log分母1->，因为单词“north”只出现

v1

接下来，我们将

名称

->在本例中，将其附加到结果列表

（1*log（3/1））

，1->最大单词出现值，log分子3->常量，log分母1->，因为单词“name”只出现

v1

此外，我们还必须通过比较

mechanical

、

be

、

tool

，对

v2

进行同样的处理。换言之，根据单词是否出现在

v1

和

v3

中，计算出现的最大值并将其乘以

w/log（3/？）

这是我对

v1

的尝试：

def f1(v1, v2, v3):
    res =[]
    for e in v1:
        if e != 0:
            if e in v2 and e in v3:
                res.append(0)
            elif e in v2:
                res.append(e * math.log(3/2))
            else:
                res.append(e * math.log(3))
    return res

[0,2.1972245773362196,0,0,0,0]

这显然不是结果

它应该返回如下内容：

[['be', 0.47], ['scott', 0.35 ], ['north', 0.47], ['revolution', 0.47], ['north', 0.47]]

根据你的描述，我得到了

import math
v1 = [['be', 'VSIS3S0', 1], ['scott', 'NP00000', 2], ['north', 'NCMS000', 1], ['revolution', 'NP00000', 1], ['name', 'VMP00SM', 1]]
v2 = [['mechanic', 'NCMS000', 1], ['be', 'VSIS3S0', 1], ['tool', 'AQ0CS0', 1], ['sam', 'NP00000', 1], ['frida', 'NP00000', 1]]
v3 = [['be', 'VSIP3S0', 1], ['scott', 'NP00000', 1], ['who', 'NP00000', 1]]

v = [v1,v2,v3]

countdict = {}
for vi in v:
    for e in vi:
        countdict[e[0]] = countdict.get(e[0],0) + 1

scoredict = {}
for vi in v:
    for e in vi:
        scoredict[e[0]] = scoredict.get(e[0],0) + (e[2] * math.log10(3.0/countdict[e[0]]))

print scoredict

我将输出保存为dict，即：

{'be': 0.0, 'revolution': 0.47712125471966244, 'north': 0.47712125471966244, 'name': 0.47712125471966244, 'sam': 0.47712125471966244, 'tool': 0.47712125471966244, 'who': 0.47712125471966244, 'scott': 0.5282737771670437, 'mechanic': 0.47712125471966244, 'frida': 0.47712125471966244}

谢谢大卫，这几乎就是结果！！！一句话：python数学日志（3）抛出1.098，但普通计算器显示0.47。。。为什么？你需要什么样的日志库？我需要为math.log10（x）@David.Zheng更改math.log10（x）@David.Zheng我实现了你的解决方案并给了我：countdict[e[0]=countdict.get（e[0]，0）+1 TypeError:'int'对象没有属性'getitem'@jp，捕捉到这样的异常很奇怪，这意味着你的代码中countdict的类型是int。