Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 标记化单词的标记列表_Python_Python 3.x_Pandas_Nltk - Fatal编程技术网

Python 标记化单词的标记列表

Python 标记化单词的标记列表,python,python-3.x,pandas,nltk,Python,Python 3.x,Pandas,Nltk,我在熊猫df中有一列,该列已通过以下方式标记: df['token_col'] = df.col.apply(word_tokenize) 现在,我尝试使用以下方式标记这些标记化单词: df['pos_col'] = nltk.tag.pos_tag(df['token_col']) df['wordnet_tagged_pos_col'] = [(w,get_wordnet_pos(t)) for (w, t) in (df['pos_col'])] 但我犯了一个错误,我不太明白: Att

我在熊猫df中有一列,该列已通过以下方式标记:

df['token_col'] = df.col.apply(word_tokenize)
现在,我尝试使用以下方式标记这些标记化单词:

df['pos_col'] = nltk.tag.pos_tag(df['token_col'])
df['wordnet_tagged_pos_col'] = [(w,get_wordnet_pos(t)) for (w, t) in (df['pos_col'])]
但我犯了一个错误,我不太明白:

AttributeError                            Traceback (most recent call last)
<ipython-input-28-99d28433d090> in <module>()
      1 #tag tokenized lists
----> 2 df['pos_col'] = nltk.tag.pos_tag(df['token_col'])
      3 df['wordnet_tagged_pos_col'] = [(w,get_wordnet_pos(t)) for (w, t) in (df['pos_col'])]

C:\Users\egagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\tag\__init__.py in pos_tag(tokens, tagset, lang)
    125     """
    126     tagger = _get_tagger(lang)
--> 127     return _pos_tag(tokens, tagset, tagger)
    128 
    129 

C:\Users\egagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\tag\__init__.py in _pos_tag(tokens, tagset, tagger)
     93 
     94 def _pos_tag(tokens, tagset, tagger):
---> 95     tagged_tokens = tagger.tag(tokens)
     96     if tagset:
     97         tagged_tokens = [(token, map_tag('en-ptb', tagset, tag)) for (token, tag) in tagged_tokens]

C:\Users\egagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\tag\perceptron.py in tag(self, tokens)
    150         output = []
    151 
--> 152         context = self.START + [self.normalize(w) for w in tokens] + self.END
    153         for i, word in enumerate(tokens):
    154             tag = self.tagdict.get(word)

C:\Users\egagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\tag\perceptron.py in <listcomp>(.0)
    150         output = []
    151 
--> 152         context = self.START + [self.normalize(w) for w in tokens] + self.END
    153         for i, word in enumerate(tokens):
    154             tag = self.tagdict.get(word)

C:\Users\egagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\nltk\tag\perceptron.py in normalize(self, word)
    236         if '-' in word and word[0] != '-':
    237             return '!HYPHEN'
--> 238         elif word.isdigit() and len(word) == 4:
    239             return '!YEAR'
    240         elif word[0].isdigit():

AttributeError: 'list' object has no attribute 'isdigit'
我的df超过70列宽,因此下面是一个小快照:

ID_number   Meeting1    Meeting2    Meeting3    Meeting4    Meeting5    col    
123456789   9/15/2015   1/8/2016    4/27/2016   NaN         NaN         [Assessment, of, Improvement, will, be, on-goi...   
987654321   9/22/2016   NaN         2/25/2017   NaN         NaN         [A, member, of, the, administrative, team, wil..   
456789123   10/1/2015   11/30/2015  NaN         NaN         NaN         [During, our, second, and, third, meetings, we...

您可以使用apply获取词性标记,即

df['pos_col'] = df['token_col'].apply(nltk.tag.pos_tag)

df['pos_col']
因为您需要在列的每个单元格上应用get\u wordnet\u pos

df['wordnet_tagged_pos_col']
0[(评估,(N,N)),(的,(N,N)),(改进。。。 1[(A,(D,n)),(成员,(n,n)),(属于,(n,n))。。。 2[(在,(I,n)),(我们的,(J,a)),(第二。。。 名称:wordnet\u tagged\u pos\u col,数据类型:object
希望有帮助。

你能发布col的示例吗?@Bharathshetty-添加了一些示例数据
get_wordnet_pos
不是内置的吗?@Bharathshetty-nope-这是函数代码
def get_wordnet_pos(pos_tag):if pos_tag[1]。startswith('J'):return(pos_tag[0],wordnet.ADJ)elif pos_tag[1]。startswith('V'):return(pos_tag[0],wordnet.VERB)elif pos_tag[1]。startswith('N'):return(pos_tag[0],wordnet.noon)elif pos_tag[1]。startswith('R'):return(pos_tag[0],wordnet.ADV)else:return(pos_tag[0],wordnet.noon)
谢谢,我运行了这段代码并得到了
值错误:太多的值无法解包(预期为2)
pos_col或wordnet col?这是pos col行 0 [(Assessment, NNP), ( of, NNP), ( Improvement,... 1 [(A, DT), ( member, NNP), ( of, NNP), ( the, N... 2 [(During, IN), ( our, JJ), ( second, NN), ( an... Name: pos_col, dtype: object
df['wordnet_tagged_pos_col'] = df['pos_col'].apply(lambda x : [(w,get_wordnet_pos(t)) for (w, t) in x],1)
df['wordnet_tagged_pos_col']
0 [(Assessment, (N, n)), ( of, (N, n)), ( Improv... 1 [(A, (D, n)), ( member, (N, n)), ( of, (N, n))... 2 [(During, (I, n)), ( our, (J, a)), ( second, (... Name: wordnet_tagged_pos_col, dtype: object