Python 如何使用NLTK标记dataframe中的文本列
我的Python 如何使用NLTK标记dataframe中的文本列,python,pandas,dataframe,nltk,Python,Pandas,Dataframe,Nltk,我的df如下所示: team_name text --------- ---- red this is text from red team blue this is text from blue team green this is text from green team yellow this is text from yellow team 我正在努力做到这一点: team_name text
df
如下所示:
team_name text
--------- ----
red this is text from red team
blue this is text from blue team
green this is text from green team
yellow this is text from yellow team
我正在努力做到这一点:
team_name text text_token
--------- ---- ----------
red this is text from red team 'this', 'is', 'text', 'from', 'red','team'
blue this is text from blue team 'this', 'is', 'text', 'from', 'blue','team'
green this is text from green team 'this', 'is', 'text', 'from', 'green','team'
yellow this is text from yellow team 'this', 'is', 'text', 'from', 'yellow','team'
我试过什么
df['text_token'] = nltk.word_tokenize(df['text'])
这是行不通的。我如何达到我想要的结果?另外,是否可以执行
频率范围?堆栈溢出有几个示例供您研究
这已在链接中解决:
. 和df['text\u token']=df.apply(lambda行:nltk.word\u tokenize(行['text']),axis=1)
这是否回答了您的问题?谢谢你写的答案。如何省略NA
值?使用df['column']。fillna(value=myValue,inplace=True)谢谢!!如何获取text\u标记的每行Freq Dist
?
df['text_token'] = df.apply(lambda row: nltk.word_tokenize(row['text']), axis=1)