Python对不带'；不要以标点符号结尾_Python_String_List

Python对不带'；不要以标点符号结尾

python string list

Python对不带'；不要以标点符号结尾,python,string,list,Python,String,List,我有一张单子 list1 = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!'] 我想加入其中 list2 = ['hello how are you?', 'i am fine thanks.', 'great!'] 有没有一种简单的肾盂方法可以做到这一点？我曾考虑过使用itertools.groupby连接，但问题是我的组的所有元素都没有相同的条件（我不能只是查询它们是否都有标点符号）。基本上，是

我有一张单子

list1 = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']

我想加入其中

list2 = ['hello how are you?', 'i am fine thanks.', 'great!']

有没有一种简单的肾盂方法可以做到这一点？我曾考虑过使用itertools.groupby连接，但问题是我的组的所有元素都没有相同的条件（我不能只是查询它们是否都有标点符号）。基本上，是否包含元素x是元素x+n的函数，其中n可以很大。这使问题复杂化

不要使用

groupby（）

；你会对那些有标点符号和没有标点符号的单词进行分组，然后你必须重新组合

使用发电机功能：

import string

def sentence_groups(l, punctuation=tuple(string.punctuation)):
    group = []
    for w in l:
        group.append(w)
        if w.endswith(punctuation):
            yield group
            group = []
    if group:
        yield group

生成器从输入列表中收集单词，直到其中一个以标点符号结束，在这一点上生成整个组，然后为新组清除该组

当迭代结束时，组中仍然有单词，最后一组也会被产生（即使它们的末尾没有标点符号）

与

str.join（）

一起使用，以生成输出：

>>> list1 = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
>>> [' '.join(group) for group in sentence_groups(list1)]
['hello how are you?', 'i am fine thanks.', 'great!']

我使用了字符串中的所有标点符号；这是相当广泛的：

>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

>>字符串.标点符号
'!"#$%&\'()*+,-./:;?@[\\]^_`{|}~'

如果要缩小范围，请传入一组特定标点符号作为第二个参数，或硬编码您自己的定义。

您可以使用

itertools.groupby

：

import itertools
import re
list1 = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
new_l = [list(b) for a, b in itertools.groupby(list1, key=lambda x:bool(re.findall('(\?|\.)$', x)))]
final_data = [' '.join(new_l[i]+new_l[i+1]) if i+2 < len(new_l) else new_l[i][0] for i in range(0, len(new_l), 2)]

一个简单的解决方案：

import string

words = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
sents = []

range_flag = 0
for index, word in enumerate(words):
  if word[-1] in string.punctuation:
    sents.append(words[range_flag+1:index+1])
    print(range_flag, index)
    range_flag = index

print([" ".join(s) for s in sents])

(0, 3)
(3, 7)
(7, 8)
['how are you?', 'i am fine thanks.', 'great!']

成功了！谢谢Martjin！你所说的groupby…如果有一个合适的密钥，你就不会得到需要组合的单独组：哈哈，要把它塞进密钥中需要做很多工作。如果你绝对需要使用groupby，我想这样做可能会有用：）@Stefanpochman：我真的需要添加或使用co吗我的答案的关键函数是什么？就像我对Ajax的答案所做的评论：当然，如果你足够努力，你可以让它工作，但这实际上是不可读的。当然，如果你足够努力，你可以让它工作，但这实际上是不可读的。祝你好运维护这段代码！可能重复的FWIW我正在寻找一个Pythonsolution@NaruS：th问题与此无关。这不仅是一种不同的编程语言，它不会在列表中出现条件时处理生成组。这个问题不是在寻找递归连接。如果是，我相信我们可以找到更好的副本，实际上使用Python。

import string

words = ['hello', 'how', 'are', 'you?', 'i', 'am', 'fine', 'thanks.', 'great!']
sents = []

range_flag = 0
for index, word in enumerate(words):
  if word[-1] in string.punctuation:
    sents.append(words[range_flag+1:index+1])
    print(range_flag, index)
    range_flag = index

print([" ".join(s) for s in sents])

(0, 3)
(3, 7)
(7, 8)
['how are you?', 'i am fine thanks.', 'great!']