Python 3.x 未考虑重写CorpusView.read_block（）_Python 3.x_Overriding_Nltk_Subclass

Python 3.x 未考虑重写CorpusView.read_block（）

python-3.x

Python 3.x 未考虑重写CorpusView.read_block（）,python-3.x,overriding,nltk,subclass,Python 3.x,Overriding,Nltk,Subclass,我想使用NLTK处理一堆文本文件，将它们按特定关键字拆分。因此，我尝试按照建议“为StreamBackedCorpusView创建子类StreamBackedCorpusView，并重写read_block（）方法” 然而，我对继承的知识已经过时了，而且似乎没有考虑到我的凌驾性。产量 corpus = CustomCorpusReader("/path/to/files/", ".*") print(corpus.words()) corpus = PlaintextCorpusReader

我想使用NLTK处理一堆文本文件，将它们按特定关键字拆分。因此，我尝试按照建议“为StreamBackedCorpusView创建子类

StreamBackedCorpusView

，并重写

read_block（）

方法”

然而，我对继承的知识已经过时了，而且似乎没有考虑到我的凌驾性。产量

corpus = CustomCorpusReader("/path/to/files/", ".*")

print(corpus.words())

corpus = PlaintextCorpusReader("/path/to/files", ".*")

print(corpus.words())

与的输出相同

corpus = CustomCorpusReader("/path/to/files/", ".*")

print(corpus.words())

corpus = PlaintextCorpusReader("/path/to/files", ".*")

print(corpus.words())

我想我遗漏了一些显而易见的东西，但是什么呢？

实际上提出了两种定义自定义语料库视图的方法：

调用StreamBackedCorpusView构造函数，并通过block_reader参数提供块读取器函数

子类StreamBackedCorpusView，并重写read_block（）方法

这也表明第一种方法更简单，事实上，我成功地使其工作如下：

from nltk.corpus import PlaintextCorpusReader
from nltk.corpus.reader.api import *

class CustomCorpusReader(PlaintextCorpusReader):

    def _custom_read_block(self, stream):
        block = stream.readline().split()
        print("wtf")
        return [] # obviously this is only for debugging

    def custom(self, fileids=None):
        return concat(
            [
                self.CorpusView(fileid, self._custom_read_block, encoding=enc)
                for (fileid, enc) in self.abspaths(fileids, True)
            ]
        )


corpus = CustomCorpusReader("/path/to/files/", ".*")

print(corpus.custom())

实际上提出了两种定义自定义语料库视图的方法：

调用StreamBackedCorpusView构造函数，并通过block_reader参数提供块读取器函数

子类StreamBackedCorpusView，并重写read_block（）方法

这也表明第一种方法更简单，事实上，我成功地使其工作如下：

from nltk.corpus import PlaintextCorpusReader
from nltk.corpus.reader.api import *

class CustomCorpusReader(PlaintextCorpusReader):

    def _custom_read_block(self, stream):
        block = stream.readline().split()
        print("wtf")
        return [] # obviously this is only for debugging

    def custom(self, fileids=None):
        return concat(
            [
                self.CorpusView(fileid, self._custom_read_block, encoding=enc)
                for (fileid, enc) in self.abspaths(fileids, True)
            ]
        )


corpus = CustomCorpusReader("/path/to/files/", ".*")

print(corpus.custom())

哦，有办法！如果没有人回答的话，让我以后找点时间回答。）哦，有办法！如果没有人回答，让我稍后再找时间回答=）