Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 未考虑重写CorpusView.read_block()_Python 3.x_Overriding_Nltk_Subclass - Fatal编程技术网

Python 3.x 未考虑重写CorpusView.read_block()

Python 3.x 未考虑重写CorpusView.read_block(),python-3.x,overriding,nltk,subclass,Python 3.x,Overriding,Nltk,Subclass,我想使用NLTK处理一堆文本文件,将它们按特定关键字拆分。因此,我尝试按照建议“为StreamBackedCorpusView创建子类StreamBackedCorpusView,并重写read_block()方法” 然而,我对继承的知识已经过时了,而且似乎没有考虑到我的凌驾性。产量 corpus = CustomCorpusReader("/path/to/files/", ".*") print(corpus.words()) corpus = PlaintextCorpusReader

我想使用NLTK处理一堆文本文件,将它们按特定关键字拆分。因此,我尝试按照建议“为StreamBackedCorpusView创建子类
StreamBackedCorpusView
,并重写
read_block()
方法”

然而,我对继承的知识已经过时了,而且似乎没有考虑到我的凌驾性。产量

corpus = CustomCorpusReader("/path/to/files/", ".*")

print(corpus.words())
corpus = PlaintextCorpusReader("/path/to/files", ".*")

print(corpus.words())
与的输出相同

corpus = CustomCorpusReader("/path/to/files/", ".*")

print(corpus.words())
corpus = PlaintextCorpusReader("/path/to/files", ".*")

print(corpus.words())
我想我遗漏了一些显而易见的东西,但是什么呢?

实际上提出了两种定义自定义语料库视图的方法:

  • 调用StreamBackedCorpusView构造函数,并通过block_reader参数提供块读取器函数
  • 子类StreamBackedCorpusView,并重写read_block()方法
  • 这也表明第一种方法更简单,事实上,我成功地使其工作如下:

    from nltk.corpus import PlaintextCorpusReader
    from nltk.corpus.reader.api import *
    
    class CustomCorpusReader(PlaintextCorpusReader):
    
        def _custom_read_block(self, stream):
            block = stream.readline().split()
            print("wtf")
            return [] # obviously this is only for debugging
    
        def custom(self, fileids=None):
            return concat(
                [
                    self.CorpusView(fileid, self._custom_read_block, encoding=enc)
                    for (fileid, enc) in self.abspaths(fileids, True)
                ]
            )
    
    
    corpus = CustomCorpusReader("/path/to/files/", ".*")
    
    print(corpus.custom())
    
    实际上提出了两种定义自定义语料库视图的方法:

  • 调用StreamBackedCorpusView构造函数,并通过block_reader参数提供块读取器函数
  • 子类StreamBackedCorpusView,并重写read_block()方法
  • 这也表明第一种方法更简单,事实上,我成功地使其工作如下:

    from nltk.corpus import PlaintextCorpusReader
    from nltk.corpus.reader.api import *
    
    class CustomCorpusReader(PlaintextCorpusReader):
    
        def _custom_read_block(self, stream):
            block = stream.readline().split()
            print("wtf")
            return [] # obviously this is only for debugging
    
        def custom(self, fileids=None):
            return concat(
                [
                    self.CorpusView(fileid, self._custom_read_block, encoding=enc)
                    for (fileid, enc) in self.abspaths(fileids, True)
                ]
            )
    
    
    corpus = CustomCorpusReader("/path/to/files/", ".*")
    
    print(corpus.custom())
    

    哦,有办法!如果没有人回答的话,让我以后找点时间回答。)哦,有办法!如果没有人回答,让我稍后再找时间回答=)