Python 基于标签的NLTK子树分离
我有一个NLTK解析树,我想只基于s标签分离树的叶子。请注意,S不应与叶子重叠 根据判决,他赢得了喷射式马拉松比赛,比赛在30分钟内结束 corenlp中的树形式为Python 基于标签的NLTK子树分离,python,tree,nltk,stanford-nlp,parse-tree,Python,Tree,Nltk,Stanford Nlp,Parse Tree,我有一个NLTK解析树,我想只基于s标签分离树的叶子。请注意,S不应与叶子重叠 根据判决,他赢得了喷射式马拉松比赛,比赛在30分钟内结束 corenlp中的树形式为 tree = '(S (NP (PRP He)) (VP (VBD won) (NP (DT the) (NNP Gusher) (NNP Marathon)) (, ,) (S (VP (VBG finishing) (PP (IN in) (NP (CD 30) (NNS minutes)
tree = '(S
(NP (PRP He))
(VP
(VBD won)
(NP (DT the) (NNP Gusher) (NNP Marathon))
(, ,)
(S (VP (VBG finishing) (PP (IN in) (NP (CD 30) (NNS minutes))))))
(. .))'
这个想法是提取2个S和它们的叶子,但不要相互重叠。所以预期的产出应该是他赢得了喷泉马拉松,。
在30分钟内完成
# Tree manipulation
# Extract phrases from a parsed (chunked) tree
# Phrase = tag for the string phrase (sub-tree) to extract
# Returns: List of deep copies; Recursive
def ExtractPhrases( myTree, phrase):
myPhrases = []
if (myTree.label() == phrase):
myPhrases.append( myTree.copy(True) )
for child in myTree:
if (type(child) is Tree):
list_of_phrases = ExtractPhrases(child, phrase)
if (len(list_of_phrases) > 0):
myPhrases.extend(list_of_phrases)
return myPhrases
我得到了输出
['He won the Gusher Marathon , finishing in 30 minutes .', 'finishing in 30 minutes']
我不想在字符串级操作它,而是在树级操作它,所以预期的输出是-
["He won the Gusher Marathon ,.", "finishing in 30 minutes."]
这是我的示例输入: a= ' 宗教基础自由 达尔文鱼保险杠贴纸和其他各种无神论者用具都是 可从美国宗教基金会获得自由。 进化设计 进化论设计出售达尔文鱼。这是鱼的象征,就像那些 基督徒坚持自己的车,但用脚和达尔文写的字 在…内豪华模制3D塑料鱼在美国售价为4.95美元。 ' 句子=nltk.sent_tokenizea 句子=[nltk.word_tokenizesent for sent in句子] 标记的句子=nltk.pos\u标记的句子内容 分块句子=列表NLTK.ne分块句子 对于发送成块的句子: 对于sent.subtreesfilter=lambda t:t.label='S'中的子树: 打印子树 以下是我的输出:
(S
(ORGANIZATION FREEDOM/NN)
(ORGANIZATION FROM/NNP)
RELIGION/NNP
FOUNDATION/NNP
Darwin/NNP
fish/JJ
bumper/NN
stickers/NNS
and/CC
assorted/VBD
other/JJ
atheist/JJ
paraphernalia/NNS
are/VBP
available/JJ
from/IN
the/DT
(ORGANIZATION Freedom/NN From/NNP Religion/NNP Foundation/NNP)
in/IN
the/DT
(GSP US/NNP)
./.)
(S
(ORGANIZATION EVOLUTION/NNP)
(ORGANIZATION DESIGNS/NNP Evolution/NNP)
Designs/NNP
sell/VB
the/DT
``/``
(PERSON Darwin/NNP)
fish/NN
''/''
./.)
(S
It/PRP
's/VBZ
a/DT
fish/JJ
symbol/NN
,/,
like/IN
the/DT
ones/NNS
Christians/NNPS
stick/VBP
on/IN
their/PRP$
cars/NNS
,/,
but/CC
with/IN
feet/NNS
and/CC
the/DT
word/NN
``/``
(PERSON Darwin/NNP)
''/''
written/VBN
inside/RB
./.)
(S
The/DT
deluxe/NN
moulded/VBD
3D/CD
plastic/JJ
fish/NN
is/VBZ
$/$
4.95/CD
postpaid/NN
in/IN
the/DT
(GSP US/NNP)
./.)
(S
(ORGANIZATION FREEDOM/NN)
(ORGANIZATION FROM/NNP)
RELIGION/NNP
FOUNDATION/NNP
Darwin/NNP
fish/JJ
bumper/NN
stickers/NNS
and/CC
assorted/VBD
other/JJ
atheist/JJ
paraphernalia/NNS
are/VBP
available/JJ
from/IN
the/DT
(ORGANIZATION Freedom/NN From/NNP Religion/NNP Foundation/NNP)
in/IN
the/DT
(GSP US/NNP)
./.)
(S
(ORGANIZATION EVOLUTION/NNP)
(ORGANIZATION DESIGNS/NNP Evolution/NNP)
Designs/NNP
sell/VB
the/DT
``/``
(PERSON Darwin/NNP)
fish/NN
''/''
./.)
(S
It/PRP
's/VBZ
a/DT
fish/JJ
symbol/NN
,/,
like/IN
the/DT
ones/NNS
Christians/NNPS
stick/VBP
on/IN
their/PRP$
cars/NNS
,/,
but/CC
with/IN
feet/NNS
and/CC
the/DT
word/NN
``/``
(PERSON Darwin/NNP)
''/''
written/VBN
inside/RB
./.)
(S
The/DT
deluxe/NN
moulded/VBD
3D/CD
plastic/JJ
fish/NN
is/VBZ
$/$
4.95/CD
postpaid/NN
in/IN
the/DT
(GSP US/NNP)
./.)