Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/80.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x NLTK。Lesk为相同的输入返回不同的结果_Python 3.x_Nltk_Wordnet_Disambiguation - Fatal编程技术网

Python 3.x NLTK。Lesk为相同的输入返回不同的结果

Python 3.x NLTK。Lesk为相同的输入返回不同的结果,python-3.x,nltk,wordnet,disambiguation,Python 3.x,Nltk,Wordnet,Disambiguation,我使用LESK算法从文本中获取语法集。但我用同样的输入得到了不同的结果。 是Lesk算法“功能”还是我做错了什么? 下面是我正在使用的代码: self.SynSets =[] sentences = sent_tokenize("Python is a widely used general-purpose, high-level programming language.\ Its design philosophy emphasizes code readab

我使用LESK算法从文本中获取语法集。但我用同样的输入得到了不同的结果。 是Lesk算法“功能”还是我做错了什么? 下面是我正在使用的代码:

    self.SynSets =[]
    sentences = sent_tokenize("Python is a widely used general-purpose, high-level programming language.\
        Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java.\
        The language provides constructs intended to enable clear programs on both a small and large scale.\
        Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles.\
        ")
    stopwordsList =  stopwords.words('english')
    self.sentNum=0;
    for sentence in sentences:
        raw_tokens =  word_tokenize(sentence)
        final_tokens = [token.lower() for token in raw_tokens 
                    if(not token in stopwordsList) 
                    #and (len(token) > 3) 
                    and not token.isdigit()]
        for token in final_tokens:
            synset = wsd.lesk(sentence, token)
            if not synset is None:
                self.SynSets.append(synset)

    self.SynSets = set(self.SynSets)
    self.WriteSynSets()
    return self
在输出中,我得到了结果(前3个结果来自2个不同的启动):

如果有其他(更稳定的)方法获得synset,我将感谢您的帮助

提前谢谢


编辑

下面是我已经运行了2次的完整脚本:

import nltk
from nltk.tokenize import sent_tokenize
from nltk import word_tokenize
from nltk import wsd
from nltk.corpus import stopwords

SynSets =[]
sentences = sent_tokenize("Python is a widely used general-purpose, high-level programming language.\
    Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java.\
    The language provides constructs intended to enable clear programs on both a small and large scale.\
    Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles.\
    ")
stopwordsList =  stopwords.words('english')

for sentence in sentences:
    raw_tokens =  word_tokenize(sentence)#WordPunctTokenizer().tokenize(sentence)
    #removing stopwords and words, smaller than 3 characters
    final_tokens = [token.lower() for token in raw_tokens 
                if(not token in stopwordsList) 
                #and (len(token) > 3) 
                and not token.isdigit()]
    for token in final_tokens:
        synset = wsd.lesk(sentence, token)
        if not synset is None:
            SynSets.append(synset)


SynSets = set(SynSets)

SynSets = sorted(SynSets)
with open("synsets.txt", "a") as file:
    file.write("\n-------------------\n")
    for synset in SynSets:
        file.write("{}   ".format(str(synset.__str__())))
file.close()
我得到了这些结果(在我运行程序的2次tat中,文件中分别写入了前4个结果语法集):

  • Synset('allow.v.04')Synset('blowfully.r.01')Synset('clear.v.11')Synset('code.n.02'))

  • Synset('blowfully.r.01')Synset('clear.v.19')Synset('code.n.01')Synset('design.n.04'))

解决方案: 我知道问题出在哪里了。在重新安装Python2.7之后,所有问题都消失了。
因此,不要将python 3.x与lesk算法一起使用。

最新版本的NLTK中有一个用于lesk算法的wsd函数:

>>> from nltk.wsd import lesk
>>> from nltk import sent_tokenize
>>> text = "Python is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. The language provides constructs intended to enable clear programs on both a small and large scale. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles."
>>> for sent in sent_tokenize(text):
...     for word in word_tokenize(sent):
...             print word, lesk(sent, word), sent
[out]:

Python Synset('python.n.02') Python is a widely used general-purpose, high-level programming language.
is Synset('be.v.08') Python is a widely used general-purpose, high-level programming language.
a Synset('angstrom.n.01') Python is a widely used general-purpose, high-level programming language.
widely Synset('wide.r.04') Python is a widely used general-purpose, high-level programming language.
used Synset('use.v.01') Python is a widely used general-purpose, high-level programming language.
general-purpose None Python is a widely used general-purpose, high-level programming language.
, None Python is a widely used general-purpose, high-level programming language.
[('Python', Synset('python.n.02')), ('is', None), ('a', None), ('widely', Synset('widely.r.03')), ('used', Synset('used.a.01')), ('general-purpose', None), (',', None), ('high-level', None), ('programming', Synset('scheduling.n.01')), ('language', Synset('terminology.n.01')), ('.', None)]
[('Its', None), ('design', Synset('purpose.n.01')), ('philosophy', Synset('philosophy.n.03')), ('emphasizes', Synset('stress.v.01')), ('code', Synset('code.n.03')), ('readability', Synset('readability.n.01')), (',', None), ('and', None), ('its', None), ('syntax', Synset('syntax.n.03')), ('allows', Synset('let.v.01')), ('programmers', Synset('programmer.n.01')), ('to', None), ('express', Synset('express.n.03')), ('concepts', Synset('concept.n.01')), ('in', None), ('fewer', None), ('lines', Synset('wrinkle.n.01')), ('of', None), ('code', Synset('code.n.03')), ('than', None), ('would', None), ('be', None), ('possible', Synset('potential.a.01')), ('in', None), ('languages', Synset('linguistic_process.n.02')), ('such', None), ('as', None), ('C++', None), ('or', None), ('Java', Synset('java.n.03')), ('.', None)]
[('The', None), ('language', Synset('language.n.01')), ('provides', Synset('provide.v.06')), ('constructs', Synset('concept.n.01')), ('intended', Synset('mean.v.03')), ('to', None), ('enable', None), ('clear', Synset('open.n.01')), ('programs', Synset('program.n.08')), ('on', None), ('both', None), ('a', None), ('small', Synset('small.a.01')), ('and', None), ('large', Synset('large.a.01')), ('scale', Synset('scale.n.10')), ('.', None)]
[('Python', Synset('python.n.02')), ('supports', Synset('support.n.11')), ('multiple', None), ('programming', Synset('program.v.02')), ('paradigms', Synset('substitution_class.n.01')), (',', None), ('including', Synset('include.v.03')), ('object-oriented', None), (',', None), ('imperative', Synset('imperative.a.02')), ('and', None), ('functional', Synset('functional.a.01')), ('programming', Synset('scheduling.n.01')), ('or', None), ('procedural', Synset('procedural.a.01')), ('styles', Synset('vogue.n.01')), ('.', None)]

另外,请尝试从
pywsd
()中执行
消歧()

[out]:

Python Synset('python.n.02') Python is a widely used general-purpose, high-level programming language.
is Synset('be.v.08') Python is a widely used general-purpose, high-level programming language.
a Synset('angstrom.n.01') Python is a widely used general-purpose, high-level programming language.
widely Synset('wide.r.04') Python is a widely used general-purpose, high-level programming language.
used Synset('use.v.01') Python is a widely used general-purpose, high-level programming language.
general-purpose None Python is a widely used general-purpose, high-level programming language.
, None Python is a widely used general-purpose, high-level programming language.
[('Python', Synset('python.n.02')), ('is', None), ('a', None), ('widely', Synset('widely.r.03')), ('used', Synset('used.a.01')), ('general-purpose', None), (',', None), ('high-level', None), ('programming', Synset('scheduling.n.01')), ('language', Synset('terminology.n.01')), ('.', None)]
[('Its', None), ('design', Synset('purpose.n.01')), ('philosophy', Synset('philosophy.n.03')), ('emphasizes', Synset('stress.v.01')), ('code', Synset('code.n.03')), ('readability', Synset('readability.n.01')), (',', None), ('and', None), ('its', None), ('syntax', Synset('syntax.n.03')), ('allows', Synset('let.v.01')), ('programmers', Synset('programmer.n.01')), ('to', None), ('express', Synset('express.n.03')), ('concepts', Synset('concept.n.01')), ('in', None), ('fewer', None), ('lines', Synset('wrinkle.n.01')), ('of', None), ('code', Synset('code.n.03')), ('than', None), ('would', None), ('be', None), ('possible', Synset('potential.a.01')), ('in', None), ('languages', Synset('linguistic_process.n.02')), ('such', None), ('as', None), ('C++', None), ('or', None), ('Java', Synset('java.n.03')), ('.', None)]
[('The', None), ('language', Synset('language.n.01')), ('provides', Synset('provide.v.06')), ('constructs', Synset('concept.n.01')), ('intended', Synset('mean.v.03')), ('to', None), ('enable', None), ('clear', Synset('open.n.01')), ('programs', Synset('program.n.08')), ('on', None), ('both', None), ('a', None), ('small', Synset('small.a.01')), ('and', None), ('large', Synset('large.a.01')), ('scale', Synset('scale.n.10')), ('.', None)]
[('Python', Synset('python.n.02')), ('supports', Synset('support.n.11')), ('multiple', None), ('programming', Synset('program.v.02')), ('paradigms', Synset('substitution_class.n.01')), (',', None), ('including', Synset('include.v.03')), ('object-oriented', None), (',', None), ('imperative', Synset('imperative.a.02')), ('and', None), ('functional', Synset('functional.a.01')), ('programming', Synset('scheduling.n.01')), ('or', None), ('procedural', Synset('procedural.a.01')), ('styles', Synset('vogue.n.01')), ('.', None)]
它们并不完美,但接近lesk的准确实施


已编辑

要验证每次运行时的结果是否相同,执行此操作时应没有标准输出:

from nltk.wsd import lesk
from nltk import sent_tokenize, word_tokenize
text = "Python is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. The language provides constructs intended to enable clear programs on both a small and large scale. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles."

lst = []
for sent in sent_tokenize(text):
    lst = []
    for word in word_tokenize(sent):
        lst.append(lesk(sent, word))
    for i in range(10):
        lst2 = []
        for word in word_tokenize(sent):
            lst2.append(lesk(sent, word))
        assert lst2 == lst
我运行了OP的代码10次,但结果相同:

import nltk
from nltk.tokenize import sent_tokenize
from nltk import word_tokenize
from nltk import wsd
from nltk.corpus import stopwords

def run():
    SynSets =[]
    sentences = sent_tokenize("Python is a widely used general-purpose, high-level programming language.\
        Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java.\
        The language provides constructs intended to enable clear programs on both a small and large scale.\
        Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles.\
        ")
    stopwordsList =  stopwords.words('english')

    for sentence in sentences:
        raw_tokens =  word_tokenize(sentence)#WordPunctTokenizer().tokenize(sentence)
        #removing stopwords and words, smaller than 3 characters
        final_tokens = [token.lower() for token in raw_tokens 
                    if(not token in stopwordsList) 
                    #and (len(token) > 3) 
                    and not token.isdigit()]
        for token in final_tokens:
            synset = wsd.lesk(sentence, token)
            if not synset is None:
                SynSets.append(synset)
    return sorted(set(SynSets))

run1 = run()

for i in range(10):
    assert run1 == run()

我已经安装了NLTK 3.0.1。我正在使用NLTK的wsd.lesk。Probem是不同的输出结果。你有类似的问题吗?为pywsd-谢谢。我试试看你叫对了吗?当我运行它10次时,没有什么不同。你是怎么称呼NLTK的lesk的?当我重新启动程序时,我有不同的看法。你们可以从示例中看到我是如何使用lesk的。并没有什么不同,使用你们的代码,我运行了你们的代码10次,但仍然是一样的。我也运行了你们的代码。它毫无例外地完成了。但后来我运行了我的代码(重新启动,直到程序结束),ir确实得到了不同的结果。这有什么问题(