在python中，最快的字符串模式替换方法是什么？_Python

在python中，最快的字符串模式替换方法是什么？

python

在python中，最快的字符串模式替换方法是什么？,python,Python,给他一根绳子 str="a@b = c" 想把它换成 str="a@'b'" 也就是说，引用“b”并删除“=”后面的任何内容及其本身在python中实现这一点的最佳方法是什么编辑：上面的“b”可以是任何长度的任何未知非空白字符串。假设要替换的字符始终在继续“@”： str="a@b = c" replaceChar = str.split('@')[1].split(' ')[0] print str.split('=')[0].replace(replaceChar, "'{0}'

给他一根绳子

str="a@b = c"

想把它换成

str="a@'b'"

也就是说，引用“b”并删除“=”后面的任何内容及其本身

在python中实现这一点的最佳方法是什么

编辑：

上面的“b”可以是任何长度的任何未知非空白字符串。假设要替换的字符始终在继续“@”：

str="a@b = c"
replaceChar = str.split('@')[1].split(' ')[0] 
print str.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')

产出：

a@'b'

a@'e'
a@'test'
a@'whammy'

在以下服务器上运行相同的代码：

str="a@e = c"
str="a@test = c"
str="a@whammy = c"

产出：

a@'b'

a@'e'
a@'test'
a@'whammy'

这就是你想要的吗

更新

由于有人最终提供了一个使用regex的方法，我们可以对它们进行基准测试

import re
import timeit

# Method #1 (string ops)
def stringOps():
    s="a@whammy = c"
    replaceChar = s.split('@')[1].split(' ')[0] 
    s.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')

# Method #2 (regex)
def regex():
    s="a@bam = c"
    re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)

timestamp1 = timeit.Timer('from __main__ import stringOps;stringOps()')
timestamp2 = timeit.Timer('from __main__ import regex;regex()')
iterations = 1000000
time1 = timestamp1.timeit(iterations)
time2 = timestamp2.timeit(iterations)
print 'Method #1 took {0}'.format(time1)
print 'Method #2 took {0}'.format(time2)

输出：

Method #1 took 4.98833298683
Method #2 took 14.708286047

因此，在这种情况下，正则表达式的速度似乎仍然较慢。虽然我相信他们，但我觉得他们更具可读性。如果您没有进行任何疯狂的迭代，我只会使用您最满意的方法。

更新示例。假设要替换的字符始终在继续“@”：

str="a@b = c"
replaceChar = str.split('@')[1].split(' ')[0] 
print str.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')

"%s@'%s'"%tuple(txt.split(' =')[0].split('@'))

产出：

a@'b'

a@'e'
a@'test'
a@'whammy'

在以下服务器上运行相同的代码：

str="a@e = c"
str="a@test = c"
str="a@whammy = c"

产出：

a@'b'

a@'e'
a@'test'
a@'whammy'

这就是你想要的吗

更新

由于有人最终提供了一个使用regex的方法，我们可以对它们进行基准测试

import re
import timeit

# Method #1 (string ops)
def stringOps():
    s="a@whammy = c"
    replaceChar = s.split('@')[1].split(' ')[0] 
    s.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')

# Method #2 (regex)
def regex():
    s="a@bam = c"
    re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)

timestamp1 = timeit.Timer('from __main__ import stringOps;stringOps()')
timestamp2 = timeit.Timer('from __main__ import regex;regex()')
iterations = 1000000
time1 = timestamp1.timeit(iterations)
time2 = timestamp2.timeit(iterations)
print 'Method #1 took {0}'.format(time1)
print 'Method #2 took {0}'.format(time2)

输出：

Method #1 took 4.98833298683
Method #2 took 14.708286047

因此，在这种情况下，正则表达式的速度似乎仍然较慢。虽然我相信他们，但我觉得他们更具可读性。如果你没有做任何疯狂的迭代，我只会做你觉得最舒服的方法

"%s@'%s'"%tuple(txt.split(' =')[0].split('@'))

这个函数适用于a或b的任意值，只要它们用“@”分隔，c用“=”分隔

注：如果b包含“=”或“@”，则它将中断

编辑：添加基于绿色单元格的速度基准

再次编辑：向基准添加其他示例

import re

import timeit

# Method #1 (string ops) -> Green Cell's
def stringOps():
    s="a@whammy = c"
    replaceChar = s.split('@')[1].split(' ')[0] 
    s.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')
time1 = timeit.timeit('from __main__ import stringOps;stringOps()')
# Method #2 (regex)  -> Dawg's 
def regex():
    s="a@bam = c"
    re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)


time2 = timeit.timeit('from __main__ import regex;regex()')

#%method 3 split_n_dice  -> my own
def slice_dice():
    txt="a@whammy = c"
    "%s@'%s'"%tuple(txt.split(' =')[0].split('@'))

time3 = timeit.timeit('from __main__ import slice_dice;slice_dice()')    

print 'Method #1 took {0}'.format(time1)
print 'Method #2 took {0}'.format(time2)
print 'Method #3 took {0}'.format(time3)

方法#1采用2.01555299759

方法2取4.66884493828

方法#3取1.44083309174

这个函数适用于a或b的任意值，只要它们用“@”分隔，c用“=”分隔

注：如果b包含“=”或“@”，则它将中断

编辑：添加基于绿色单元格的速度基准

再次编辑：向基准添加其他示例

import re

import timeit

# Method #1 (string ops) -> Green Cell's
def stringOps():
    s="a@whammy = c"
    replaceChar = s.split('@')[1].split(' ')[0] 
    s.split('=')[0].replace(replaceChar, "'{0}'".format(replaceChar) ).replace(' ', '')
time1 = timeit.timeit('from __main__ import stringOps;stringOps()')
# Method #2 (regex)  -> Dawg's 
def regex():
    s="a@bam = c"
    re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)


time2 = timeit.timeit('from __main__ import regex;regex()')

#%method 3 split_n_dice  -> my own
def slice_dice():
    txt="a@whammy = c"
    "%s@'%s'"%tuple(txt.split(' =')[0].split('@'))

time3 = timeit.timeit('from __main__ import slice_dice;slice_dice()')    

print 'Method #1 took {0}'.format(time1)
print 'Method #2 took {0}'.format(time2)
print 'Method #3 took {0}'.format(time3)

方法#1采用2.01555299759

方法2取4.66884493828

方法#3取1.44083309174

由于您声明上面的“b”可以是任何长度的任何未知非空白字符串，因此最好的可能是正则表达式

此正则表达式执行替换：

/(\w+)(\s*=\s*\w+$)/'\1'/

在Python中：

>>> import re
>>> s="a@b = c"
>>> re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)
"a@'b'"

由于您声明上面的“b”可以是任何长度的任何未知非空白字符串，因此最好的可能是正则表达式

此正则表达式执行替换：

/(\w+)(\s*=\s*\w+$)/'\1'/

在Python中：

>>> import re
>>> s="a@b = c"
>>> re.sub(r'(\w+)(\s*=\s*\w+$)', r"'\1'", s)
"a@'b'"

不确定这是最快的还是最有效的，但它非常简单

依赖于

和

是字符串中的常量，并且每个常量只有一个

s = "a@b = c"
keep, _ = s.split('=')
keep = keep.strip()
keep = keep.split('@')
keep[1] = "\'" + keep[1] + "\'"
#keep[1] = r"'" + keep[1] + r"'"
#keep[1] = "'" + keep[1] + "'"
result = '@'.join(keep)

作为一项功能：

def f(s):
    keep, _ = s.split('=')
    keep = keep.strip()
    keep = keep.split('@')
    keep[1] = "\'" + keep[1] + "\'"
    return '@'.join(keep)

不确定这是最快的还是最有效的，但它非常简单

依赖于

和

是字符串中的常量，并且每个常量只有一个

s = "a@b = c"
keep, _ = s.split('=')
keep = keep.strip()
keep = keep.split('@')
keep[1] = "\'" + keep[1] + "\'"
#keep[1] = r"'" + keep[1] + r"'"
#keep[1] = "'" + keep[1] + "'"
result = '@'.join(keep)

作为一项功能：

def f(s):
    keep, _ = s.split('=')
    keep = keep.strip()
    keep = keep.split('@')
    keep[1] = "\'" + keep[1] + "\'"
    return '@'.join(keep)

正则表达式。只需使用正则表达式。这将是“最快的”和“最佳的”足够。如果没有，您将需要构建一个性能基准（希望是一个全面解决问题的基准）。但实际上，在“工作”之前，甚至没有必要考虑这些事情。我认为您可能需要更详细地了解您的模式。例如，Pabtore的答案是正确的，但只适用于这个字符串。这就是你要找的吗？“@”后面总是一个“b”吗？如果不是，它是否仅在@之后和=？或者是a@bxyz = ... 可能吗？@JLPeyret是的，实际上b可以是任何字符串“bxyz…”正则表达式。只需使用正则表达式。这将是“最快的”和“最佳的”足够。如果没有，您将需要构建一个性能基准（希望是一个全面解决问题的基准）。但实际上，在“工作”之前，甚至没有必要考虑这些事情。我认为您可能需要更详细地了解您的模式。例如，Pabtore的答案是正确的，但只适用于这个字符串。这就是你要找的吗？“@”后面总是一个“b”吗？如果不是，它是否仅在@之后和=？或者是a@bxyz = ... 可能吗？@JLPeyret是的，确实b可以是任何字符串“bxyz…”并且poof/推断这是“更快的”（或者更“优化”的方式）？我觉得这很复杂，很难理解；因此，没有理由，我不能容忍这种情况。一般来说，我总是尽量避免使用正则表达式，直到它变得合理为止。这对于字符串操作来说相当简单，任何使用python编写代码的人都知道这里发生了什么。如果您认为这不符合标准，请展示您的解决方案。通常情况下，与正则表达式相比，大小写字符串操作更可取。而且有些人不喜欢/不喜欢正则表达式。这应该是他们的选择，不是吗？OP可以决定使用哪个选项，你可以随时发布自己的答案，不是吗？对我来说，我发现简单的字符串函数通常更快，但如果不进行基准测试，我可能会错。是的，当我觉得有必要的时候，我会使用正则表达式。@JLPeyret无论“某些人”喜欢与否，正则表达式都是一般编程所必需的一部分。有些任务太难或根本不可能单独使用字符串操作来执行。例如，查找两个数字，后跟一个或多个空格/制表符/换行符，后跟三个大写字母。可以使用字符串操作（可能），但是使用regex.和poof/推理的短1行程序可以“更快”（或更“优化”的方式）？我觉得这很复杂，很难理解；因此，没有正当理由，我不能宽恕这种行为