Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/335.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从现有的字符串数据框列创建单词标记数据框?_Python_Pandas_Numpy_Nltk - Fatal编程技术网

Python 如何从现有的字符串数据框列创建单词标记数据框?

Python 如何从现有的字符串数据框列创建单词标记数据框?,python,pandas,numpy,nltk,Python,Pandas,Numpy,Nltk,我有一个熊猫数据帧df,形式如下: df = pd.DataFrame.from_dict({'ID':[1,2,3], \ 'Strings':['Hello, how are you?', 'Nice to meet you!', 'My name is John.']}) 我想标记字符串列并创建一个新的数据框new_df: Sentence Word 0 Hello 0 , 0 how 0 are

我有一个熊猫数据帧df,形式如下:

df = pd.DataFrame.from_dict({'ID':[1,2,3], \
'Strings':['Hello, how are you?', 'Nice to meet you!', 'My name is John.']})
我想标记字符串列并创建一个新的数据框new_df

Sentence    Word
   0        Hello
   0        ,
   0        how
   0        are
   0        you
   0        ?
   1        Nice
   1        to
   1        meet
   1        you
   1        .
   2        My
   2        name
   2        is
   2        John
   2        .

我知道对于标记化,我可以在df中使用evert字符串,但是如何以高效的方式从这一点到新的\u df

import nltk
pd.DataFrame(df.Strings.map(nltk.word_tokenize).tolist(), index=df.ID).stack()

要清除索引,请使用
reset\u index

(pd.DataFrame(df.Strings.map(nltk.word_tokenize).tolist(), index=df.ID)
   .stack()
   .reset_index(level=1, drop=True)
   .reset_index(name='Word'))

    ID   Word
0    1  Hello
1    1      ,
2    1    how
3    1    are
4    1    you
5    1      ?
6    2   Nice
7    2     to
8    2   meet
9    2    you
10   2      !
11   3     My
12   3   name
13   3     is
14   3   John
15   3      .

nltk之后,问题变得更加严重


非常感谢您分享这个伟大的解决方案。
df.Strings=df.Strings.map(nltk.word_tokenize).tolist()

unnesting(df,['Strings'])
Out[22]: 
  Strings  ID
0   Hello   1
0       ,   1
0     how   1
0     are   1
0     you   1
0       ?   1
1    Nice   2
1      to   2
1    meet   2
1     you   2
1       !   2
2      My   3
2    name   3
2      is   3
2    John   3
2       .   3