Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/cmake/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用python将句子分成段落_Python_Nltk_Paragraphs - Fatal编程技术网

使用python将句子分成段落

使用python将句子分成段落,python,nltk,paragraphs,Python,Nltk,Paragraphs,我需要使用python将视频标题的句子分成段落。我尝试使用nltk.tokenize.textiling,但没有得到任何结果。 以下是全文摘录: – [Voiceover] Bob Dylan is, you must be 20 years old now, aren't you? – [Voiceover] Yeah, I must be 20. (laughing) – [Voiceover] Are you? – [Voiceover] Yeah, I'm 20, I'm 20. (gu

我需要使用python将视频标题的句子分成段落。我尝试使用nltk.tokenize.textiling,但没有得到任何结果。 以下是全文摘录:

– [Voiceover] Bob Dylan is,
you must be 20 years old now,
aren't you?
– [Voiceover] Yeah, I must be 20.
(laughing)
– [Voiceover] Are you?
– [Voiceover] Yeah, I'm 20, I'm 20.
(guitar music)
My hands are cold.
It's a pretty cold studio.
– [Voiceover] The coldest studio.
– [Voiceover] Usually can do this.
There I just want to do it once.
(guitar strumming)
– [Voiceover] When I first heard Bob Dylan
was, I think, about three
years ago in Minneapolis.
– [Voiceover] At that time I
was just sort of doing nothing.
I was there working, I guess.
I was making pretend I was
going to school out there.
I'd just come there from South Dakota.
– [Voiceover] You've sung
now at Goody's here in town.
Have you sung at any of the coffee houses?

看起来很简单,你可以用正则表达式, 我不知道你想要哪种格式,但这里有一个例子

import re

sentence = """
– [Voiceover] Bob Dylan is,
you must be 20 years old now,
aren't you?
– [Voiceover] Yeah, I must be 20.
(laughing)
– [Voiceover] Are you?
– [Voiceover] Yeah, I'm 20, I'm 20.
(guitar music)
My hands are cold.
It's a pretty cold studio.
– [Voiceover] The coldest studio.
– [Voiceover] Usually can do this.
There I just want to do it once.
(guitar strumming)
– [Voiceover] When I first heard Bob Dylan
was, I think, about three
years ago in Minneapolis.
– [Voiceover] At that time I
was just sort of doing nothing.
I was there working, I guess.
I was making pretend I was
going to school out there.
I'd just come there from South Dakota.
– [Voiceover] You've sung
now at Goody's here in town.
Have you sung at any of the coffee houses?

"""

start_re = re.compile(r'\–\s\[.*?\]')
result = re.split(start_re,sentence)
result = filter(lambda x:x, [s.replace('\n','').strip() for s in result])
print result
输出

["Bob Dylan is,you must be 20 years old now,aren't you?", 'Yeah, I must be 20.(laughing)', 'Are you?', "Yeah, I'm 20, I'm 20.(guitar music)My hands are cold.It's a pretty cold studio.", 'The coldest studio.', 'Usually can do this.There I just want to do it once.(guitar strumming)', 'When I first heard Bob Dylanwas, I think, about threeyears ago in Minneapolis.', "At that time Iwas just sort of doing nothing.I was there working, I guess.I was making pretend I wasgoing to school out there.I'd just come there from South Dakota.", "You've sungnow at Goody's here in town.Have you sung at any of the coffee houses?"]

我从来没有想到过这样的事。在任何情况下,你都在试图解析人类对话的文本吗?将这些段落转换成段落似乎完全是无稽之谈——段落是组织人类写作的一种方式,不适用于人类对话。你如何将句子分割成段落?段落是由句子组成的,但句子是由单词组成的。可以使用NLTK模块找到句子,如果句子之间的主题完全改变,那么它就是新段落的指示。这就是文本拼接的工作原理。我认为这是为了找出句子,而不是段落。我知道段落识别是一项艰巨的任务,边界也很微妙,但如果您能向我推荐任何其他库,而不是nltk.tokenize.textiling,我将不胜感激。