Python中的字符串操作

Python中的字符串操作,python,regex,performance,time-complexity,Python,Regex,Performance,Time Complexity,最近我对HackerRank进行了一次测试,我的问题是: 为字符串返回可由其形成的最简洁的字符串,例如: string = 'watson Represents|watson represents|Watson represents a first step into cognitive systems, a new era of computing.|first step into Cognitive|Cognitive Systems; a new era|what does watso

最近我对HackerRank进行了一次测试,我的问题是:

为字符串返回可由其形成的最简洁的字符串,例如:

string = 'watson  Represents|watson represents|Watson represents a first step into cognitive systems, a new era of computing.|first step into  Cognitive|Cognitive Systems; a new era|what does watson represent'
  • 下面的字符串包含许多重复项,例如
    Watson表示
    我们必须忽略字符之间的额外间距或小写/大写。
    Watson表示
    Watson表示
    是相同的
  • 分号和逗号代表同一事物。例如
    认知系统;一个新的时代
    出现在
    内部,沃森代表着进入认知系统的第一步,一个新的计算时代。
  • 最后一个字符串不应包含任何重复项忽略小写/大写或额外的空格(如果有)
我的答覆是:

watson = 'watson  Represents|watson represents|Watson represents a first step into cognitive systems, a new era of computing.|first step into  Cognitive|Cognitive Systems; a new era|what does watson represent'

import re

watson = re.sub(r';', r',', watson)  #replace the semicolon with colon
watson = watson.strip().split('|')
removeWatson = list(watson)

for i in range(len(watson)):

    for j in range(len(watson)):

        if j == i:
            continue

        if " ".join(watson[i].strip().lower().split()) in " ".join(watson[j].strip().lower().split()):
            removeWatson[i] = ''

print "|".join(filter(None, removeWatson))
我的回答肯定是低效的,我想知道你们是否可以建议我用另一种方法来解决这个问题


最简洁的字符串是:
Watson代表进入认知系统的第一步,一个新的计算时代。| Watson代表什么

我的想法是,我被要求精确地表示原始字符串,即我可以从最简洁的版本复制原始字符串

换句话说,压缩它

from __future__ import print_function
from zlib import compress, decompress

string = 'watson  Represents|watson represents|Watson represents a first step into cognitive systems, a new era of computing.|first step into  Cognitive|Cognitive Systems; a new era|what does watson represent'

print("Original length:", len(string))
compressed = compress(string)
print("Compressed length:", len(compressed))
decompressed = decompress(compressed)
print("Decompressed is equal:", decompressed == string)
结果是:

Original length: 198
Compressed length: 116
Decompressed is equal: True

您可以使用
re

在一个循环中完成此操作,您得到的结果有什么问题?你不想让它有
| watson代表什么
?你能发布一个链接到HackerRank挑战吗?这是一个很难找到答案的HackerRank测试link@RobertR答案是正确的,但效率很低code@RakeshRanjanSukla我很难理解你真正的问题是什么。如果效率低下是你的问题,也许这个问题更适合我们,我们必须输出压缩字符串而不是长度。@RakeshRanjanSukla:你没有抓住要点:有趣的是zlib
compress
能够压扁它。显然,改变程序以打印出压缩的二进制表示是微不足道的。当我打印出二进制表示时,它是完全不同的,甚至不是unicode字符。你知道我会怎么破译吗?
string = 'watson  Represents|watson represents|Watson represents a first   step into cognitive systems, a new era of computing.|first step into  Cognitive|Cognitive Systems; a new era|what does watson represent'
ll=string.split("|")
ll.sort(key=len)
import re
ll2=[re.sub(r"\s+"," ",re.sub(r"[;,]+","",i.lower())) for i in ll]
j=1
k=0
for i in ll2:
    if re.findall(r"\b"+i.lower()+r"\b","|".join(ll2[j:]),flags=re.I):
        string=string.replace(ll[k],"",1)
    k=k+1
    j=j+1
print re.sub(r"^\|+|\|(?=\|)|\|+$","",string