Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/286.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中从字符串中删除连续的单字母字符_Python - Fatal编程技术网

如何在python中从字符串中删除连续的单字母字符

如何在python中从字符串中删除连续的单字母字符,python,Python,我有一个字符串,如下所示,我想删除长度超过5的连续单字母字符 mystring = "the nucleotide sequence of wheat triticum aestivum l chloroplastid ribosome associated 4 5 s rna is u a g u g a g c g c g a g a c g a g c g u a u a g u g u c a g u g a g u g c a g u g a u g u a u g c a g c u

我有一个字符串,如下所示,我想删除长度超过5的连续单字母字符

mystring = "the nucleotide sequence of wheat triticum aestivum l chloroplastid ribosome associated 4 5 s rna is u a g u g a g c g c g a g a c g a g c g u a u a g u g u c a g u g a g u g c a g u g a u g u a u g c a g c u g a g c a u c u a c g a c g a c g a u g a coh"
我的输出应该如下

myoutput = "the nucleotide sequence of wheat triticum aestivum l chloroplastid ribosome associated 4 5 s rna is coh"
 for i, my in enumerate(line.split()):
     if len(my) == 1:
             count = count + 1
     else:
            count = 0
     if count == 5:
             print(i)
我试着按如下方式做

myoutput = "the nucleotide sequence of wheat triticum aestivum l chloroplastid ribosome associated 4 5 s rna is coh"
 for i, my in enumerate(line.split()):
     if len(my) == 1:
             count = count + 1
     else:
            count = 0
     if count == 5:
             print(i)
总之,我会进行计数,检查它是否有5个长度的单字母字符,并从列表中删除5个位置,以此类推

但是,如果不使用变量来计算长度和删除5乘5,我希望以更有效的pythonic方式执行此操作


如果需要,我很乐意提供更多细节。

我相信在这种情况下,我们可以使用正则表达式来解决此问题:

mystring = ("the nucleotide sequence of wheat triticum aestivum l"
            "chloroplastid ribosome associated 4 5 s rna is u a "
            "g u g a g c g c g a g a c g a g c g u a u a g u g u "
            "c a g u g a g u g c a g u g a u g u a u g c a g c u "
            "g a g c a u c u a c g a c g a c g a u g a coh")
print(mystring)

# See https://regex101.com/r/aUDK7K/1
# \b: word boundary
# \w: word char
# \s+: one or more white spaces
# {5,}: 5 or more times
shorten = re.sub(r'(\b\w\s+){5,}', '', mystring)
print(shorten)

这就是正则表达式的用途。您想保留
4
5
s
?如果我正确理解了问题陈述,请看一看为什么输出中没有“4 5 s”?@Ronaldaronson非常感谢您指出这一点。这是一个打字错误。我纠正了它。请参阅编辑后的问题:)@Jan,这应该可以回答问题。在一般情况下,单个字符的最后一个字符可能不一定后面有任何空格,因为字符串“this is a example a b c d e f g h”中有空格,因此我建议:shorten=re.sub(r'\b(\w\s+){4,}\w\s*,'',mystring)