Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 如何选择文本的前280个单词,直到最近的句点?_Ruby_Regex - Fatal编程技术网

Ruby 如何选择文本的前280个单词,直到最近的句点?

Ruby 如何选择文本的前280个单词,直到最近的句点?,ruby,regex,Ruby,Regex,我需要从较长的文本中提取指定数量的较短文本段。我可以使用 text = "There was a very big cat that was sitting on the ledge. It was overlooking the garden. The dog next door watched with curiosity." text.split[0..15].join(' ') >>""There was a very big cat that was s

我需要从较长的文本中提取指定数量的较短文本段。我可以使用

text = "There was a very big cat that was sitting on the ledge. It was  overlooking the garden. The dog next door watched with curiosity."

    text.split[0..15].join(' ')
    >>""There was a very big cat that was sitting on the ledge. It was  overlooking"
我想选择下一节课的课文,这样我就不会以一个不完整的句子结束

有没有一种方法可以使用正则表达式来完成我想做的事情,使文本达到并包括第15个单词后最接近的下一个句点?

您可以使用

(?:\w+[,.?!]?\s+){14}(?:\w+,?\s+)*?\w+[.?!]
重复一个单词(可选[逗号/句点/问号/感叹号])和空格14次。然后,它惰性地重复一个单词,后跟一个空格,后跟另一个单词和一个句点,确保模式在从开始算起的15个单词之后的第一个句点结束

自由间距模式去掉空格,这就是上面的空格字符位于字符类(
[]+
)中的原因。按照惯例,正则表达式如下所示

/(?:\p{Alpha}+[.!?]? +){14,}?\p{Alpha}+[.!?]/

您可以按照以下思路做一些事情:

text = "There was a very big cat that was sitting on the ledge. It was  overlooking the garden. The dog next door watched with curiosity."

tgt=15
old_text=text.scan(/[^.]+\.\s?/)
new_text=[]
while (old_text && new_text.join.scan(/\b\p{Alpha}+\b/).length<=tgt) do
   new_text << old_text.shift
end   

p new_text.join

这适用于任何长度的正常句子,并且当一个额外的句子超过单词目标时,就会中断。

谢谢。效果很好。如果我想添加任何句子结尾的标点符号,如?还有!是吗?:\w+\.\?\\s+{15}(?:\w+\s?*?\。\!\?使用字符集而不是
\.
(只匹配文字句点)(?:\w+[\.!?]?\s+{15}(?:\w+\s?*?[\.!?])似乎有效。看起来不错,虽然不需要在字符集中转义句点-在这里,它们默认情况下匹配文字句点。我想我需要一些调整。当我使用一个大的文本时,它似乎选择了15个文本组,其中一些在文本主体内,而不是在开始处。我只想从课文的开头开始,用15个单词进行第一次匹配。也许你想把你对问题的陈述弄清楚。我的理解是,您希望获得
text
的子字符串,该子字符串以
text
开头,以标点符号结尾,包含尽可能少的单词,但不少于15个单词。此外,应更改标题以删除对
“280”
的引用。
text = "There was a very big cat that was sitting on the ledge. It was  overlooking the garden. The dog next door watched with curiosity."

tgt=15
old_text=text.scan(/[^.]+\.\s?/)
new_text=[]
while (old_text && new_text.join.scan(/\b\p{Alpha}+\b/).length<=tgt) do
   new_text << old_text.shift
end   

p new_text.join
"There was a very big cat that was sitting on the ledge. It was  overlooking the garden. "