Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/364.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用NLTK计算句子从句_Python_Regex_Parsing_Nlp_Nltk - Fatal编程技术网

Python 用NLTK计算句子从句

Python 用NLTK计算句子从句,python,regex,parsing,nlp,nltk,Python,Regex,Parsing,Nlp,Nltk,我是NLP的新手,目前正在尝试Python NLTK。NLTK中比较令人困惑的事情之一是语法构造。 在NLTK书中提供的示例中,语法是专门为分析中的每个句子编写的 grammar1 = nltk.CFG.fromstring(""" S -> NP VP VP -> V NP | V NP PP PP -> P NP V -> "saw" | "ate" | "walked" NP -> "John" | "Mary" | "Bob" | De

我是NLP的新手,目前正在尝试Python NLTK。NLTK中比较令人困惑的事情之一是语法构造。 在NLTK书中提供的示例中,语法是专门为分析中的每个句子编写的

grammar1 = nltk.CFG.fromstring("""


 S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"
  """)


sent = "Mary saw Bob".split()
rd_parser = nltk.RecursiveDescentParser(grammar1)
for tree in rd_parser.parse(sent):
    print(tree)
(S (NP Mary) (VP (V saw) (NP Bob)))
我想分析大量的报纸文章,显然,为每个句子编写专门的语法不是一项可行的任务。具体来说,我需要知道每个句子的从句数量。对于这样一项任务,是否已经有了一套语法,或者如果没有,人们将如何着手编写一套语法

我所有的句子都经过语法分析和词性标注——例如

[(u'Her', 'PRP$'),
 (u'first', 'JJ'),
 (u'term', 'NN'),
 (u'followed', 'VBN'),
 (u'a', 'DT'),
 (u'string', 'NN'),
 (u'of', 'IN'),
 (u'high', 'JJ'),
 (u'profile', 'NN'),
 (u'police', 'NNS'),
 (u'abuse', 'VBP'),
 (u'cases', 'NNS'),
 (u'including', 'VBG'),
 (u'the', 'DT'),
 (u'choking', 'NN'),
 (u'death', 'NN'),
 (u'of', 'IN'),
 (u'a', 'DT'),
 (u'Hispanic', 'NNP'),
 (u'man', 'NN'),
 (u'in', 'IN'),
 (u'1994', 'CD'),
 (u'the', 'DT'),
 (u'Louima', 'NNP'),
 (u'case', 'NN'),
 (u'in', 'IN'),
 (u'1997', 'CD'),
 (u'and', 'CC'),
 (u'the', 'DT'),
 (u'shooting', 'NN'),
 (u'deaths', 'NNS'),
 (u'of', 'IN'),
 (u'a', 'DT'),
 (u'West', 'NNP'),
 (u'African', 'NNP'),
 (u'immigrant', 'NN'),
 (u'in', 'IN'),
 (u'1999', 'CD'),
 (u'and', 'CC'),
 (u'a', 'DT'),
 (u'black', 'JJ'),
 (u'security', 'NN'),
 (u'guard', 'NN'),
 (u'in', 'IN'),
 (u'early', 'JJ'),
 (u'2000', 'CD')]

nltk.data
提供了一些现成的语法

CFG语法示例:

>>> grammar = nltk.data.load('grammars/large_grammars/atis.cfg')
>>> grammar
<Grammar with 5517 productions>
语法=nltk.data.load('grammars/large\u grammars/atis.cfg') >>>文法 请注意,
nltk.parse
包还为解析器提供接口,如或

另外,也许可以看看这里的其他问题。就像: 或者

你是说所有的句子都有标记和词性标记吗?