Python 3.x NLTK ne_树单词标记化列行中的块（Python/Pandas/Jupyter）_Python 3.x_Pandas_Nltk_Jupyter

Python 3.x NLTK ne_树单词标记化列行中的块（Python/Pandas/Jupyter）

python-3.x pandas

Python 3.x NLTK ne_树单词标记化列行中的块（Python/Pandas/Jupyter）,python-3.x,pandas,nltk,jupyter,Python 3.x,Pandas,Nltk,Jupyter,我刚开始学习自然语言。我正试着对单词进行分类。我基本上是在找人、地方和组织到目前为止，在脚本中定义一行文本是可行的 ex = 'John' ne_tree = nltk.ne_chunk(pos_tag(word_tokenize(ex))) print(ne_tree) 输出： (S (PERSON John/NNP)) 我的问题是，我能用这个脚本指定整个列吗我的桌子如下例2： Order基本上就是我创建的索引。我的想法是以后我可以把句子分解成单词，保留一个键，然后融化。文本是我想

我刚开始学习自然语言。我正试着对单词进行分类。我基本上是在找人、地方和组织

到目前为止，在脚本中定义一行文本是可行的

ex = 'John'
ne_tree =  nltk.ne_chunk(pos_tag(word_tokenize(ex)))
print(ne_tree)

输出：

(S (PERSON John/NNP))

我的问题是，我能用这个脚本指定整个列吗

我的桌子如下

例2：

Order基本上就是我创建的索引。我的想法是以后我可以把句子分解成单词，保留一个键，然后融化。文本是我想要标记的内容

当我运行这段代码时，我得到以下错误。也许我没有正确地调用它，我需要指定列？谢谢你的帮助

ne_tree =  nltk.ne_chunk(pos_tag(word_tokenize(ex2)))
print(ne_tree)

错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-80-5d4582e937dd> in <module>
----> 1 ne_tree =  nltk.ne_chunk(pos_tag(word_tokenize(ex3)))
      2 print(ne_tree)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\__init__.py in word_tokenize(text, language, preserve_line)
    142     :type preserve_line: bool
    143     """
--> 144     sentences = [text] if preserve_line else sent_tokenize(text, language)
    145     return [
    146         token for sent in sentences for token in _treebank_word_tokenizer.tokenize(sent)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\__init__.py in sent_tokenize(text, language)
    104     """
    105     tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
--> 106     return tokenizer.tokenize(text)
    107 
    108 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\punkt.py in tokenize(self, text, realign_boundaries)
   1275         Given a text, returns a list of the sentences in that text.
   1276         """
-> 1277         return list(self.sentences_from_text(text, realign_boundaries))
   1278 
   1279     def debug_decisions(self, text):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\punkt.py in sentences_from_text(self, text, realign_boundaries)
   1329         follows the period.
   1330         """
-> 1331         return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
   1332 
   1333     def _slices_from_text(self, text):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\punkt.py in <listcomp>(.0)
   1329         follows the period.
   1330         """
-> 1331         return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
   1332 
   1333     def _slices_from_text(self, text):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\punkt.py in span_tokenize(self, text, realign_boundaries)
   1319         if realign_boundaries:
   1320             slices = self._realign_boundaries(text, slices)
-> 1321         for sl in slices:
   1322             yield (sl.start, sl.stop)
   1323 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\punkt.py in _realign_boundaries(self, text, slices)
   1360         """
   1361         realign = 0
-> 1362         for sl1, sl2 in _pair_iter(slices):
   1363             sl1 = slice(sl1.start + realign, sl1.stop)
   1364             if not sl2:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\punkt.py in _pair_iter(it)
    316     it = iter(it)
    317     try:
--> 318         prev = next(it)
    319     except StopIteration:
    320         return

~\AppData\Local\Continuum\anaconda3\lib\site-packages\nltk\tokenize\punkt.py in _slices_from_text(self, text)
   1333     def _slices_from_text(self, text):
   1334         last_break = 0
-> 1335         for match in self._lang_vars.period_context_re().finditer(text):
   1336             context = match.group() + match.group('after_tok')
   1337             if self.text_contains_sentbreak(context):

TypeError: expected string or bytes-like object

您还必须将该函数应用于每行值

ex2['results'] = ex2.Text.apply(lambda x: nltk.ne_chunk(pos_tag(word_tokenize(x))))

ex2['results'] = ex2.Text.apply(lambda x: nltk.ne_chunk(pos_tag(word_tokenize(x))))