Python 3.x 未考虑重写CorpusView.read_block()
我想使用NLTK处理一堆文本文件,将它们按特定关键字拆分。因此,我尝试按照建议“为StreamBackedCorpusView创建子类Python 3.x 未考虑重写CorpusView.read_block(),python-3.x,overriding,nltk,subclass,Python 3.x,Overriding,Nltk,Subclass,我想使用NLTK处理一堆文本文件,将它们按特定关键字拆分。因此,我尝试按照建议“为StreamBackedCorpusView创建子类StreamBackedCorpusView,并重写read_block()方法” 然而,我对继承的知识已经过时了,而且似乎没有考虑到我的凌驾性。产量 corpus = CustomCorpusReader("/path/to/files/", ".*") print(corpus.words()) corpus = PlaintextCorpusReader
StreamBackedCorpusView
,并重写read_block()
方法”
然而,我对继承的知识已经过时了,而且似乎没有考虑到我的凌驾性。产量
corpus = CustomCorpusReader("/path/to/files/", ".*")
print(corpus.words())
corpus = PlaintextCorpusReader("/path/to/files", ".*")
print(corpus.words())
与的输出相同
corpus = CustomCorpusReader("/path/to/files/", ".*")
print(corpus.words())
corpus = PlaintextCorpusReader("/path/to/files", ".*")
print(corpus.words())
我想我遗漏了一些显而易见的东西,但是什么呢?实际上提出了两种定义自定义语料库视图的方法:
from nltk.corpus import PlaintextCorpusReader
from nltk.corpus.reader.api import *
class CustomCorpusReader(PlaintextCorpusReader):
def _custom_read_block(self, stream):
block = stream.readline().split()
print("wtf")
return [] # obviously this is only for debugging
def custom(self, fileids=None):
return concat(
[
self.CorpusView(fileid, self._custom_read_block, encoding=enc)
for (fileid, enc) in self.abspaths(fileids, True)
]
)
corpus = CustomCorpusReader("/path/to/files/", ".*")
print(corpus.custom())
实际上提出了两种定义自定义语料库视图的方法:
from nltk.corpus import PlaintextCorpusReader
from nltk.corpus.reader.api import *
class CustomCorpusReader(PlaintextCorpusReader):
def _custom_read_block(self, stream):
block = stream.readline().split()
print("wtf")
return [] # obviously this is only for debugging
def custom(self, fileids=None):
return concat(
[
self.CorpusView(fileid, self._custom_read_block, encoding=enc)
for (fileid, enc) in self.abspaths(fileids, True)
]
)
corpus = CustomCorpusReader("/path/to/files/", ".*")
print(corpus.custom())
哦,有办法!如果没有人回答的话,让我以后找点时间回答。)哦,有办法!如果没有人回答,让我稍后再找时间回答=)