Python 句子标记器-对熊猫的空间_Python_Pandas_Spacy

Python 句子标记器-对熊猫的空间

python pandas

Python 句子标记器-对熊猫的空间,python,pandas,spacy,Python,Pandas,Spacy,使用SPACYNLP执行句子标记器并将其写入数据帧 # -*- coding: utf-8 -*- #!/usr/bin/env python from __future__ import unicode_literals # Extraction import spacy,en_core_web_sm import pandas as pd # Read the text file nlp = en_core_web_sm.load() doc = nlp(unicode(open('o.

使用SPACYNLP执行句子标记器并将其写入数据帧

# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals

# Extraction
import spacy,en_core_web_sm
import pandas as pd

# Read the text file
nlp = en_core_web_sm.load()
doc = nlp(unicode(open('o.txt').read().decode('utf8')) )

for idno, sentence in enumerate(doc.sents):
    print 'Sentence {}:'.format(idno + 1), sentence

Sentences = list(doc.sents)
df = pd.DataFrame(Sentences)
print df

输出：

Sentence 1: This is a sample sentence.
Sentence 2: This is a second sample sentence.
Sentence 3: This is a third sample sentence.
      0   1  2       3         4         5     6
0  This  is  a  sample  sentence         .  None
1  This  is  a  second    sample  sentence     .
2  This  is  a   third    sample  sentence     .

大熊猫的预期产量

    0
0   This is a sample sentence.
1   This is a second sample sentence.
2   This is a third sample sentence.

如何实现预期的输出？

您可以做的是构造一个列表，然后将其转换为Dataframe

# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals

# Extraction
import spacy,en_core_web_sm
import pandas as pd

# Read the text file
nlp = en_core_web_sm.load()
doc = nlp(unicode(open('o.txt').read().decode('utf8')) )

d = []
for idno, sentence in enumerate(doc.sents):
    d.append({"id": idno, "sentence":str(sentence)})
    print 'Sentence {}:'.format(idno + 1), sentence 
df = pd.DataFrame(d)
df.set_index('id', inplace=True)
print df

您应该能够使用

pd.read\u table（input\u file\u path）

并调整参数以将文本导入到单个列，我们称之为df['text']

然后试试这个：

df['sents']=df['text'].apply（lambda x:list（nlp（x）.sents））

您将有一个新列，其中包含一个句子标记列表

祝你好运