Nlp 在项目符号和编号处拆分句子?
我试着把文字输入文字处理器,先把文字分成句子,然后再分成文字 示例段落:Nlp 在项目符号和编号处拆分句子?,nlp,nltk,sentence,Nlp,Nltk,Sentence,我试着把文字输入文字处理器,先把文字分成句子,然后再分成文字 示例段落: When the blow was repeated,together with an admonition in childish sentences, he turned over upon his back, and held his paws in a peculiar manner. 1) This a numbered sentence 2) This is the second numbered sente
When the blow was repeated,together with an admonition in
childish sentences, he turned over upon his back, and held his paws in a peculiar manner.
1) This a numbered sentence
2) This is the second numbered sentence
At the same time with his ears and his eyes he offered a small prayer to the child.
Below are the examples
- This an example of bullet point sentence
- This is also an example of bullet point sentence
我尝试了以下代码
from nltk.tokenize import TweetTokenizer, sent_tokenize
tokenizer_words = TweetTokenizer()
tokens_sentences = [tokenizer_words.tokenize(t) for t in
nltk.sent_tokenize(input_text)]
print(tokens_sentences)
我得到的输出
[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'],
['1', ')', 'This', 'a', 'numbered', 'sentence', '2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence','At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples', '-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence',
'-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'],
['1', ')', 'This', 'a', 'numbered', 'sentence']
['2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence']
['At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples']
['-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence']
['-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
所需输出
[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'],
['1', ')', 'This', 'a', 'numbered', 'sentence', '2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence','At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples', '-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence',
'-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'],
['1', ')', 'This', 'a', 'numbered', 'sentence']
['2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence']
['At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples']
['-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence']
['-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
如何在项目符号和编号处拆分句子?
spaCy的解决方案也会很有帮助我不确定spaCy的情况。在Ruby中,您可以使用和
text="当重击的时候,再加上孩子气的训诫,他翻了个身,,以一种特殊的方式握住爪子。\n\n1)这是一个编号的句子\n2)这是第二个编号的句子\n\n同时他用耳朵和眼睛向孩子做了一个小小的祈祷。\n\n下面是例子\n-这是一个要点句的例子\n-这也是一个要点句的例子“
最终_数组=[]
segments=practicsegmenter::Segmenter.new(文本:text).segment
分段。每个do |分段|
final_array这可以是一个解决方案。您可以根据您的数据进行自定义
text = """When the blow was repeated,together with an admonition in
childish sentences, he turned over upon his back, and held his paws in a peculiar manner.
1) This a numbered sentence
2) This is the second numbered sentence
At the same time with his ears and his eyes he offered a small prayer to the child.
Below are the examples
- This an example of bullet point sentence
- This is also an example of bullet point sentence"""
import re
import nltk
sentences = nltk.sent_tokenize(text)
results = []
for sent in sentences:
sent = re.sub(r'(\n)(-|[0-9])', r"\1\n\2", sent)
sent = sent.split('\n\n')
for s in sent:
results.append(nltk.word_tokenize(s))
results
[
['When', 'the', 'blow', 'was', 'repeated', ',', 'together', 'with', 'an', 'admonition', 'in', 'childish', 'sentences', ',', 'he', 'turned', 'over', 'upon', 'his', 'back', ',', 'and', 'held', 'his', 'paws', 'in', 'a', 'peculiar', 'manner', '.'],
['1', ')', 'This', 'a', 'numbered', 'sentence']
['2', ')', 'This', 'is', 'the', 'second', 'numbered', 'sentence']
['At', 'the', 'same', 'time', 'with', 'his', 'ears', 'and', 'his', 'eyes', 'he', 'offered', 'a', 'small', 'prayer', 'to', 'the', 'child', '.']
['Below', 'are', 'the', 'examples']
['-', 'This', 'an', 'example', 'of', 'bullet', 'point', 'sentence']
['-', 'This', 'also','an', 'example', 'of', 'bullet', 'point', 'sentence']
]
这是一个python端口:尝试了上面的python端口,但它给了我奇怪的结果。它可以用任何其他python方法解决吗