使用python的块_Python_Nlp - Fatal编程技术网

使用python的块

python nlp

使用python的块,python,nlp,Python,Nlp,我试图从一个带标签的段落中提取所有的专有名词。我在代码中所做的是，首先我分别提取了段落，然后检查其中是否有专有名词。但问题是，我无法提取专有名词。我的代码甚至没有进入循环中检查特定的标记我的代码： def noun(sen): m=[] if (sen.split('/')[1].lower().startswith('np')&sen.split('/')[1].lower().endswith('np')): w=sen.strip().split('/')[0]

我试图从一个带标签的段落中提取所有的专有名词。我在代码中所做的是，首先我分别提取了段落，然后检查其中是否有专有名词。但问题是，我无法提取专有名词。我的代码甚至没有进入循环中检查特定的标记

我的代码：

def noun(sen):
m=[]
if (sen.split('/')[1].lower().startswith('np')&sen.split('/')[1].lower().endswith('np')):
         w=sen.strip().split('/')[0]
         m.append(w)
return m


import nltk
rp = open("tesu.txt", 'r')
text = rp.read()
list = []
sentences = splitParagraph(text)
for s in sentences:
 list.append(s)

来自“tesu.txt”的示例输入

Several/ap defendants/nns in/in the/at Summerdale/np police/nn burglary/nn trial/nn      made/vbd statements/nns indicating/vbg their/pp$ guilt/nn at/in the/at.... 

Bellows/np made/vbd the/at disclosure/nn when/wrb he/pps asked/vbd Judge/nn-tl Parsons/np to/to grant/vb his/pp$ client/nn ,/, Alan/np Clements/np ,/, 30/cd ,/, a/at separate/jj trial/nn ./.

如何从一个段落中提取所有标记的专有名词？

您应该使用代码样式。我想里面有很多不必要的循环。在

splitparation

中还有一个不必要的方法，它基本上只调用已经存在的

split

方法，并且

import re

，但以后再也不使用它。同时识别代码，这样很难理解。您应该提供一个来自

“tesu.txt”

的输入示例，以便我们可以为您提供更多帮助。无论如何，您的所有代码都可以压缩为：

 def noun(sentence);
    word, tag = sentence.split('/')
    if (tag.lower().startswith('np') and tag.lower().endswith('np')):
         return word
    return False

if __name__ == '__main__'
    words = []
    with open('tesu.txt', 'r') as file_p:
         for sentence in file_p.read().split('\n\n'): 
              result = noun(sentence)
              if result:
                   words.append(result)

谢谢你的数据样本

您需要：

阅读每一段/每一行
按空格分隔行以提取每个标记的单词，例如
```
Summerdale/np
```
将单词按
```
/
```
拆分，查看是否标记了
```
np
```
如果是这样，将另一半的拆分（实际单词）添加到名词列表中

下面的内容（基于Bogdan的回答，谢谢！）

对于您的示例数据，它生成：

['Summerdale', 'Bellows', 'Parsons', 'Alan', 'Clements']

更新：事实上，您可以将整个过程缩短为：

nouns = []
with open('tesu.txt', 'r') as file_p:
  for word in file_p.read().split(): 
    word, tag = word.split('/')
    if (tag.lower() == 'np'):
      nouns.append(word)
print nouns

如果你不在乎名词来自哪个段落

如果标记始终是小写的，那么您也可以去掉

.lower（）

。

请给我们一个标记段落的示例，否则我们无法判断您的代码是否正确。@DNA我提供了一个示例输入。请检查感谢感谢感谢感谢感谢。谢谢您的帮助。但是如果我尝试使用您的代码，它会给出一个错误词，tag=句子。split（“/”）ValueError：太多的值需要解包。我已经给出了上面的示例输入

nouns = []
with open('tesu.txt', 'r') as file_p:
  for word in file_p.read().split(): 
    word, tag = word.split('/')
    if (tag.lower() == 'np'):
      nouns.append(word)
print nouns