Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于标签的NLTK子树分离_Python_Tree_Nltk_Stanford Nlp_Parse Tree - Fatal编程技术网

Python 基于标签的NLTK子树分离

Python 基于标签的NLTK子树分离,python,tree,nltk,stanford-nlp,parse-tree,Python,Tree,Nltk,Stanford Nlp,Parse Tree,我有一个NLTK解析树,我想只基于s标签分离树的叶子。请注意,S不应与叶子重叠 根据判决,他赢得了喷射式马拉松比赛,比赛在30分钟内结束 corenlp中的树形式为 tree = '(S (NP (PRP He)) (VP (VBD won) (NP (DT the) (NNP Gusher) (NNP Marathon)) (, ,) (S (VP (VBG finishing) (PP (IN in) (NP (CD 30) (NNS minutes)

我有一个NLTK解析树,我想只基于s标签分离树的叶子。请注意,S不应与叶子重叠

根据判决,他赢得了喷射式马拉松比赛,比赛在30分钟内结束

corenlp中的树形式为

tree = '(S
  (NP (PRP He))
  (VP
    (VBD won)
    (NP (DT the) (NNP Gusher) (NNP Marathon))
    (, ,)
    (S (VP (VBG finishing) (PP (IN in) (NP (CD 30) (NNS minutes))))))
  (. .))'
这个想法是提取2个S和它们的叶子,但不要相互重叠。所以预期的产出应该是他赢得了喷泉马拉松,。 在30分钟内完成

# Tree manipulation

# Extract phrases from a parsed (chunked) tree
# Phrase = tag for the string phrase (sub-tree) to extract
# Returns: List of deep copies;  Recursive
def ExtractPhrases( myTree, phrase):
    myPhrases = []
    if (myTree.label() == phrase):
        myPhrases.append( myTree.copy(True) )
    for child in myTree:
        if (type(child) is Tree):
            list_of_phrases = ExtractPhrases(child, phrase)
            if (len(list_of_phrases) > 0):
                myPhrases.extend(list_of_phrases)
    return myPhrases
我得到了输出

['He won the Gusher Marathon , finishing in 30 minutes .', 'finishing in 30 minutes']

我不想在字符串级操作它,而是在树级操作它,所以预期的输出是-

["He won the Gusher Marathon ,.",  "finishing in 30 minutes."]

这是我的示例输入:

a= ' 宗教基础自由 达尔文鱼保险杠贴纸和其他各种无神论者用具都是 可从美国宗教基金会获得自由。 进化设计 进化论设计出售达尔文鱼。这是鱼的象征,就像那些 基督徒坚持自己的车,但用脚和达尔文写的字 在…内豪华模制3D塑料鱼在美国售价为4.95美元。 ' 句子=nltk.sent_tokenizea 句子=[nltk.word_tokenizesent for sent in句子] 标记的句子=nltk.pos\u标记的句子内容 分块句子=列表NLTK.ne分块句子 对于发送成块的句子: 对于sent.subtreesfilter=lambda t:t.label='S'中的子树: 打印子树 以下是我的输出:

(S
  (ORGANIZATION FREEDOM/NN)
  (ORGANIZATION FROM/NNP)
  RELIGION/NNP
  FOUNDATION/NNP
  Darwin/NNP
  fish/JJ
  bumper/NN
  stickers/NNS
  and/CC
  assorted/VBD
  other/JJ
  atheist/JJ
  paraphernalia/NNS
  are/VBP
  available/JJ
  from/IN
  the/DT
  (ORGANIZATION Freedom/NN From/NNP Religion/NNP Foundation/NNP)
  in/IN
  the/DT
  (GSP US/NNP)
  ./.)

(S
  (ORGANIZATION EVOLUTION/NNP)
  (ORGANIZATION DESIGNS/NNP Evolution/NNP)
  Designs/NNP
  sell/VB
  the/DT
  ``/``
  (PERSON Darwin/NNP)
  fish/NN
  ''/''
  ./.)

(S
  It/PRP
  's/VBZ
  a/DT
  fish/JJ
  symbol/NN
  ,/,
  like/IN
  the/DT
  ones/NNS
  Christians/NNPS
  stick/VBP
  on/IN
  their/PRP$
  cars/NNS
  ,/,
  but/CC
  with/IN
  feet/NNS
  and/CC
  the/DT
  word/NN
  ``/``
  (PERSON Darwin/NNP)
  ''/''
  written/VBN
  inside/RB
  ./.)

(S
  The/DT
  deluxe/NN
  moulded/VBD
  3D/CD
  plastic/JJ
  fish/NN
  is/VBZ
  $/$
  4.95/CD
  postpaid/NN
  in/IN
  the/DT
  (GSP US/NNP)
  ./.)
(S
  (ORGANIZATION FREEDOM/NN)
  (ORGANIZATION FROM/NNP)
  RELIGION/NNP
  FOUNDATION/NNP
  Darwin/NNP
  fish/JJ
  bumper/NN
  stickers/NNS
  and/CC
  assorted/VBD
  other/JJ
  atheist/JJ
  paraphernalia/NNS
  are/VBP
  available/JJ
  from/IN
  the/DT
  (ORGANIZATION Freedom/NN From/NNP Religion/NNP Foundation/NNP)
  in/IN
  the/DT
  (GSP US/NNP)
  ./.)

(S
  (ORGANIZATION EVOLUTION/NNP)
  (ORGANIZATION DESIGNS/NNP Evolution/NNP)
  Designs/NNP
  sell/VB
  the/DT
  ``/``
  (PERSON Darwin/NNP)
  fish/NN
  ''/''
  ./.)

(S
  It/PRP
  's/VBZ
  a/DT
  fish/JJ
  symbol/NN
  ,/,
  like/IN
  the/DT
  ones/NNS
  Christians/NNPS
  stick/VBP
  on/IN
  their/PRP$
  cars/NNS
  ,/,
  but/CC
  with/IN
  feet/NNS
  and/CC
  the/DT
  word/NN
  ``/``
  (PERSON Darwin/NNP)
  ''/''
  written/VBN
  inside/RB
  ./.)

(S
  The/DT
  deluxe/NN
  moulded/VBD
  3D/CD
  plastic/JJ
  fish/NN
  is/VBZ
  $/$
  4.95/CD
  postpaid/NN
  in/IN
  the/DT
  (GSP US/NNP)
  ./.)