Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/arrays/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用结尾字符拆分句子_Python_Arrays_String_List - Fatal编程技术网

Python 用结尾字符拆分句子

Python 用结尾字符拆分句子,python,arrays,string,list,Python,Arrays,String,List,最近的一个项目让我需要将输入的短语(作为字符串)拆分成它们的组成句子。例如,此字符串: “你母亲是一只仓鼠,你父亲闻到了接骨木的味道!现在走开,否则我会再次嘲笑你。你知道吗,没关系。这整句话太傻了。你不同意吗?我想是的。” 将需要转换为由以下元素组成的列表: ["Your mother was a hamster, and your father smelt of elderberries", "Now go away, or I shall taunt you a second time",

最近的一个项目让我需要将输入的短语(作为字符串)拆分成它们的组成句子。例如,此字符串:

“你母亲是一只仓鼠,你父亲闻到了接骨木的味道!现在走开,否则我会再次嘲笑你。你知道吗,没关系。这整句话太傻了。你不同意吗?我想是的。”

将需要转换为由以下元素组成的列表:

["Your mother was a hamster, and your father smelt of elderberries",
"Now go away, or I shall taunt you a second time",
"You know what, never mind",
"This entire sentence is far too silly",
"Wouldn't you agree",
"I think it is"]
就本函数而言,“句子”是以
结尾的字符串
请注意,应如上所示从输出中删除标点符号

我有一个工作版本,但它非常难看,留下了前导和尾随空格,我忍不住认为有更好的方法:

from functools import reduce

def split_sentences(st):
  if type(st) is not str:
    raise TypeError("Cannot split non-strings")
  sl = st.split('.')
  sl = [s.split('?') for s in sl]
  sl = reduce(lambda x, y: x+y, sl) #Flatten the list
  sl = [s.split('!') for s in sl]
  return reduce(lambda x, y: x+y, sl)

改为使用
re.split
指定与任何句子结尾字符(以及后面的任何空格)匹配的正则表达式


改为使用
re.split
指定与任何句子结尾字符(以及后面的任何空格)匹配的正则表达式


您可以使用regex
split
将它们拆分为特定的特殊字符

import re
str = "Your mother was a hamster, and your father smelt of elderberries! Now go away, or I shall taunt you a second time. You know what, never mind. This entire sentence is far too silly. Wouldn't you agree? I think it is."
re.compile(r'[?.!]\s+').split(str)

您可以使用regex
split
将它们拆分为特定的特殊字符

import re
str = "Your mother was a hamster, and your father smelt of elderberries! Now go away, or I shall taunt you a second time. You know what, never mind. This entire sentence is far too silly. Wouldn't you agree? I think it is."
re.compile(r'[?.!]\s+').split(str)

您也可以在不使用正则表达式的情况下执行此操作:

result = [s.strip() for s in String.replace('!', '.').replace('?', '.').split('.')]
或者,您可以编写一种前沿算法,它不会复制太多数据:

String = list(String)

for i in range(len(String)):
    if (String[i] == '?') or (String[i] == '!'):
        String[i] = '.'

String = [s.strip() for s in String.split('.')]

您也可以在不使用正则表达式的情况下执行此操作:

result = [s.strip() for s in String.replace('!', '.').replace('?', '.').split('.')]
或者,您可以编写一种前沿算法,它不会复制太多数据:

String = list(String)

for i in range(len(String)):
    if (String[i] == '?') or (String[i] == '!'):
        String[i] = '.'

String = [s.strip() for s in String.split('.')]

美好的如果您解释了if-else,可能会对OP有所帮助。如果输入字符串末尾有终止符,
re.split
将返回一个数组,该数组末尾有一个空项。空字符串是假字符串,因此如果数组[-1]的最后一项为空,则返回除最后一项之外的所有内容的范围。Nice。如果您解释了if-else,可能会对OP有所帮助。如果输入字符串末尾有终止符,
re.split
将返回一个数组,该数组末尾有一个空项。空字符串是错误的,因此如果数组[-1]的最后一项为空,则返回除最后一项之外的所有内容的范围。