Python 2.7 Python在不拆分转义字符的情况下拆分字符串_Python 2.7

Python 2.7 Python在不拆分转义字符的情况下拆分字符串

python-2.7

Python 2.7 Python在不拆分转义字符的情况下拆分字符串,python-2.7,Python 2.7,有没有办法在不拆分转义字符的情况下拆分字符串？例如，我有一个字符串，希望按“：”而不是按“\：”分割 http\://www.example.url:ftp\://www.example.url 结果应如下所示： ['http\://www.example.url' , 'ftp\://www.example.url'] 注意：似乎不是需要转义的字符我能想到的实现这一点的最简单方法是在角色上拆分，然后在转义时将其重新添加示例代码（非常需要整理）：正如伊格纳西奥所说，是的，但并非一蹴而就

有没有办法在不拆分转义字符的情况下拆分字符串？例如，我有一个字符串，希望按“：”而不是按“\：”分割

http\://www.example.url:ftp\://www.example.url

结果应如下所示：

['http\://www.example.url' , 'ftp\://www.example.url']

注意：似乎不是需要转义的字符

我能想到的实现这一点的最简单方法是在角色上拆分，然后在转义时将其重新添加

示例代码（非常需要整理）：

正如伊格纳西奥所说，是的，但并非一蹴而就。问题是，您需要回过头来确定是否使用转义分隔符，而基本

字符串.split

不提供该功能

如果这不是一个紧循环，因此性能不是一个重要问题，那么可以通过首先对转义分隔符进行拆分，然后执行拆分，然后合并来实现。演示代码如下：

# Bear in mind this is not rigorously tested!
def escaped_split(s, delim):
    # split by escaped, then by not-escaped
    escaped_delim = '\\'+delim
    sections = [p.split(delim) for p in s.split(escaped_delim)] 
    ret = []
    prev = None
    for parts in sections: # for each list of "real" splits
        if prev is None:
            if len(parts) > 1:
                # Add first item, unless it's also the last in its section
                ret.append(parts[0])
        else:
            # Add the previous last item joined to the first item
            ret.append(escaped_delim.join([prev, parts[0]]))
        for part in parts[1:-1]:
            # Add all the items in the middle
            ret.append(part)
        prev = parts[-1]
    return ret

s = r'http\://www.example.url:ftp\://www.example.url'
print (escaped_split(s, ':')) 
# >>> ['http\\://www.example.url', 'ftp\\://www.example.url']

或者，如果您只是手动拆分字符串，则可能更容易遵循逻辑

def escaped_split(s, delim):
    ret = []
    current = []
    itr = iter(s)
    for ch in itr:
        if ch == '\\':
            try:
                # skip the next character; it has been escaped!
                current.append('\\')
                current.append(next(itr))
            except StopIteration:
                pass
        elif ch == delim:
            # split! (add current to the list and reset it)
            ret.append(''.join(current))
            current = []
        else:
            current.append(ch)
    ret.append(''.join(current))
    return ret

请注意，第二个版本在遇到后跟分隔符的双转义时的行为稍有不同：此函数允许转义转义字符，因此

escaped\u split（r'a\\\\\\\，'：'）

['a\\\\\'，'b']

，因为第一个

转义第二个，将

：

保留为真正的分隔符。所以这是需要注意的。

有一种更简单的方法，使用带有负lookback断言的正则表达式：

re.split（r'（？亨利答案的编辑版本与Python3兼容，测试并修复了一些问题：
def split_unescape(s, delim, escape='\\', unescape=True):
    """
    >>> split_unescape('foo,bar', ',')
    ['foo', 'bar']
    >>> split_unescape('foo$,bar', ',', '$')
    ['foo,bar']
    >>> split_unescape('foo$$,bar', ',', '$', unescape=True)
    ['foo$', 'bar']
    >>> split_unescape('foo$$,bar', ',', '$', unescape=False)
    ['foo$$', 'bar']
    >>> split_unescape('foo$', ',', '$', unescape=True)
    ['foo$']
    """
    ret = []
    current = []
    itr = iter(s)
    for ch in itr:
        if ch == escape:
            try:
                # skip the next character; it has been escaped!
                if not unescape:
                    current.append(escape)
                current.append(next(itr))
            except StopIteration:
                if unescape:
                    current.append(escape)
        elif ch == delim:
            # split! (add current to the list and reset it)
            ret.append(''.join(current))
            current = []
        else:
            current.append(ch)
    ret.append(''.join(current))
    return ret

没有内置的功能。
这是一个高效、通用且经过测试的函数，它甚至支持任意长度的分隔符：
def escape_split(s, delim):
    i, res, buf = 0, [], ''
    while True:
        j, e = s.find(delim, i), 0
        if j < 0:  # end reached
            return res + [buf + s[i:]]  # add remainder
        while j - e and s[j - e - 1] == '\\':
            e += 1  # number of escapes
        d = e // 2  # number of double escapes
        if e != d * 2:  # odd number of escapes
            buf += s[i:j - d - 1] + s[j]  # add the escaped char
            i = j + 1  # and skip it
            continue  # add more to buf
        res.append(buf + s[i:j - d])
        i, buf = j + len(delim), ''  # start after delim

def escape_分割（s，delim）：
i、 res，buf=0，[]，“”
尽管如此：
j、 e=s.find（delim，i），0
如果j<0:#到达终点
返回res+[buf+s[i:]#添加余数
而j-e和s[j-e-1]='\\'：
e+=1#逃逸次数
d=e//2#双越位次数
如果e！=d*2:#奇数个逃逸
buf+=s[i:j-d-1]+s[j]#添加转义字符
i=j+1#跳过它
继续#向buf添加更多内容
res.append（buf+s[i:j-d]）
i、 buf=j+len（delim），“delim之后开始
这里有一个有效的解决方案，可以正确处理双转义，即任何后续分隔符都不会转义。它会忽略不正确的单转义作为字符串的最后一个字符
它非常有效，因为它只在输入字符串上迭代一次，操作索引而不是复制字符串。它不是构造列表，而是返回一个生成器
def split_esc(string, delimiter):
    if len(delimiter) != 1:
        raise ValueError('Invalid delimiter: ' + delimiter)
    ln = len(string)
    i = 0
    j = 0
    while j < ln:
        if string[j] == '\\':
            if j + 1 >= ln:
                yield string[i:j]
                return
            j += 1
        elif string[j] == delimiter:
            yield string[i:j]
            i = j + 1
        j += 1
    yield string[i:j]

def split_esc（字符串，分隔符）：
如果len（分隔符）！=1：
raise VALUERROR（'无效分隔符：'+分隔符）
ln=len（字符串）
i=0
j=0
而j=ln：
屈服字符串[i:j]
返回
j+=1
elif字符串[j]==分隔符：
屈服字符串[i:j]
i=j+1
j+=1
屈服字符串[i:j]

要允许分隔符比单个字符长，只需在“elif”情况下将i和j前移分隔符的长度。这假定单个转义字符转义整个分隔符，而不是单个字符
使用Python3.5.1进行测试。
我认为简单的C类解析将更加简单和健壮
def escaped_split(str, ch):
    if len(ch) > 1:
        raise ValueError('Expected split character. Found string!')
    out = []
    part = ''
    escape = False
    for i in range(len(str)):
        if not escape and str[i] == ch:
            out.append(part)
            part = ''
        else:
            part += str[i]
            escape = not escape and str[i] == '\\'
    if len(part):
        out.append(part)
    return out

我根据Henry Keiter的答案创建了此方法，但具有以下优点：

变量转义字符和分隔符
如果转义字符实际上不是转义内容，请不要删除它

代码如下：
def _split_string(self, string: str, delimiter: str, escape: str) -> [str]:
    result = []
    current_element = []
    iterator = iter(string)
    for character in iterator:
        if character == self.release_indicator:
            try:
                next_character = next(iterator)
                if next_character != delimiter and next_character != escape:
                    # Do not copy the escape character if it is inteded to escape either the delimiter or the
                    # escape character itself. Copy the escape character if it is not in use to escape one of these
                    # characters.
                    current_element.append(escape)
                current_element.append(next_character)
            except StopIteration:
                current_element.append(escape)
        elif character == delimiter:
            # split! (add current to the list and reset it)
            result.append(''.join(current_element))
            current_element = []
        else:
            current_element.append(character)
    result.append(''.join(current_element))
    return result

这是指示行为的测试代码：
def test_split_string(self):
    # Verify normal behavior
    self.assertListEqual(['A', 'B'], list(self.sut._split_string('A+B', '+', '?')))

    # Verify that escape character escapes the delimiter
    self.assertListEqual(['A+B'], list(self.sut._split_string('A?+B', '+', '?')))

    # Verify that the escape character escapes the escape character
    self.assertListEqual(['A?', 'B'], list(self.sut._split_string('A??+B', '+', '?')))

    # Verify that the escape character is just copied if it doesn't escape the delimiter or escape character
    self.assertListEqual(['A?+B'], list(self.sut._split_string('A?+B', '\'', '?')))

基于@user629923的建议，但比其他答案简单得多：
import re
DBL_ESC = "!double escape!"

s = r"Hello:World\:Goodbye\\:Cruel\\\:World"

map(lambda x: x.replace(DBL_ESC, r'\\'), re.split(r'(?<!\\):', s.replace(r'\\', DBL_ESC)))

重新导入
DBL_ESC=“！双转义！”
s=r“你好：世界\：再见\\：残酷\\：世界”
map（lambda x:x.replace（DBL\u ESC，r'\\'）、re.split（r'）（？我真的知道这是一个老问题，但我最近需要这样的函数，但没有找到任何符合我要求的函数
规则：

Escape char仅在与Escape char或delimiter一起使用时有效。例如，如果delimiter是/
，Escape是\
，则（\a\b\c/abc
bacame['\a\b\c'，abc']
将转义多个转义字符。（\\
变为\
）

为了记录在案，如果有人长得像我，我的功能建议如下：
def str\u escape\u split（str\u to\u escape，分隔符='，'，escape='\ \'）：
“”“使用分隔符和转义符拆分字符串
Args：
str_to_escape（[type]）：要拆分的文本
分隔符（str，可选）：使用的分隔符。默认为“，”。
转义（str，可选）：转义字符。默认为“\”。
产量：
[类型]：要转义的字符串列表
"""
如果len（分隔符）>1或len（转义）>1：
raise VALUERROR（“分隔符或转义符必须是一个字符的值”）
令牌=“”
转义=假
对于str_中的c到_转义：
如果c==转义：
如果逃逸：
令牌+=转义
转义=假
其他：
转义=真
持续
如果c==分隔符：
如果没有逃脱：
收益券
令牌=“”
其他：
令牌+=c
转义=假
其他：
如果逃逸：
令牌+=转义
转义=假
令牌+=c
收益券

为了理智起见，我正在做一些测试：
#结构为：
#'字符串\u被\u分割\u转义
def test_split_string(self):
    # Verify normal behavior
    self.assertListEqual(['A', 'B'], list(self.sut._split_string('A+B', '+', '?')))

    # Verify that escape character escapes the delimiter
    self.assertListEqual(['A+B'], list(self.sut._split_string('A?+B', '+', '?')))

    # Verify that the escape character escapes the escape character
    self.assertListEqual(['A?', 'B'], list(self.sut._split_string('A??+B', '+', '?')))

    # Verify that the escape character is just copied if it doesn't escape the delimiter or escape character
    self.assertListEqual(['A?+B'], list(self.sut._split_string('A?+B', '\'', '?')))

import re
DBL_ESC = "!double escape!"

s = r"Hello:World\:Goodbye\\:Cruel\\\:World"

map(lambda x: x.replace(DBL_ESC, r'\\'), re.split(r'(?<!\\):', s.replace(r'\\', DBL_ESC)))