Python 为替换拆分（）而生成的函数的意外行为_Python_String_Split

Python 为替换拆分（）而生成的函数的意外行为

python string

Python 为替换拆分（）而生成的函数的意外行为,python,string,split,Python,String,Split,我编写了一个比split（）内置函数性能更好的函数（我知道这不是惯用的python，但我给出了最好的结果），所以当我传递这个参数时： better_split("After the flood ... all the colors came out."," .") 我期待着这样的结果： ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out'] 然而，令人惊讶的是，该函数会导致（对我来说）不可理解的行为。当它到达最

我编写了一个比split（）内置函数性能更好的函数（我知道这不是惯用的python，但我给出了最好的结果），所以当我传递这个参数时：

better_split("After  the flood   ...  all the colors came out."," .")

我期待着这样的结果：

['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']

然而，令人惊讶的是，该函数会导致（对我来说）不可理解的行为。当它到达最后两个单词时，它不会抑制more“”，而不是添加到结果列表中的“cam”和“out”，而是添加到“come out”，因此，我得到了以下结果：

['After', 'the', 'flood', 'all', 'the', 'colors', 'came out']

有经验的人知道为什么会这样吗？提前感谢您的帮助

def better_split(text,markersString):
markers = []
splited = []
for e in markersString:
    markers.append(e)    
for character in text:
    if character in markers:
        point = text.find(character)
        if text[:point] not in character:
            word = text[:point]
            splited.append(word)            
            while text[point] in markers and point+1 < len(text):
                point = point + 1
            text = text[point:]                   
print 'final splited = ', splited

更简单的解决方案您的

better\u split

函数比您想象的要简单。我已按以下方式实施：

def better_split(s, seps):
    result = [s]
    def split_by(sep):
        return lambda s: s.split(sep)
    for sep in seps:
        result = sum(map(split_by(sep), result), [])
    return filter(None, result)  # Do not return empty elements

测验关于代码的提示

您不需要将
```
标记字符串
```
更改为
```
标记
```
，您可以直接通过
```
标记字符串
```
进行迭代
```
text[：point]不带字符
```
在
```
point>1
```
时总是
```
True
```
的，所以没有用

point=text.find（character）

每次在

text

中找不到

character

时，将为您提供

point=-1

试着简化代码，Python的一条规则是：“如果有些东西很难解释，那就不是好主意”。不幸的是，您的代码甚至很难阅读，其中包含大量冗余语句，加上看起来应该与实际工作方式不同的语句（例如，使用
```
str.find
```
获取分隔符的位置，然后在不检查获取切片的情况下使用它

更简单的解决方案您的

better\u split

函数比您想象的要简单。我已将其实现为：

def better_split(s, seps):
    result = [s]
    def split_by(sep):
        return lambda s: s.split(sep)
    for sep in seps:
        result = sum(map(split_by(sep), result), [])
    return filter(None, result)  # Do not return empty elements

测验关于代码的提示

您不需要将
```
标记字符串
```
更改为
```
标记
```
，您可以直接通过
```
标记字符串
```
进行迭代
```
text[：point]不带字符
```
在
```
point>1
```
时总是
```
True
```
的，所以没有用

point=text.find（character）

每次在

text

中找不到

character

时，将为您提供

point=-1

试着简化你的代码，Python的一条规则说：“如果有些东西很难解释，那是个坏主意”。不幸的是，你的代码甚至很难阅读，包含了很多冗余语句，加上看起来应该与实际工作方式不同的语句（例如，使用
```
str.find
```
获取分离器的位置，然后在不检查获取切片的情况下使用它

    for character in text:

        point = text.find(character)

text

text

文本中的斜接字符

            while text[point] in markers and point+n < len(text):
                point = point + 1
            text = text[point:]

            while text[point] in markers and point+n < len(text):
                point = point + 1
            text = text[point:]

    for character in text:

        point = text.find(character)

text

text

文本中的斜接字符

            while text[point] in markers and point+n < len(text):
                point = point + 1
            text = text[point:]

            while text[point] in markers and point+n < len(text):
                point = point + 1
            text = text[point:]

for character in text:

def better_split(text,markersString):
    # simple and better way for 'for e in markerString...'
    markers = list(markersString)
    splited = []

    # there is no need to assign variable n, we all know it should be 1
    # n = 1    

    def iter_text(text):
        # check if text is an empty string,
        # NOTE this `text` will cover `text` in upper function as to local scope,
        # so it's actually the text everytime iter_text() get,
        # not the one better_split() get.
        if not text:
            return
        # [UPDATES 2012-03-17 01:07 EST]
        # add a flag to judge if there are markers in `text`
        _has_marker = False
        for character in text:
            if character in markers:
                # set `_has_marker` to True to indicate `text` has been handled
                _has_marker = True
                point = text.find(character)
                word = text[:point]
                splited.append(word)
                # check if text[point] is legal, to prevent raising of IndexError
                while point + 1 <= len(text) and text[point] in markers:
                    point = point + 1
                text = text[point:]
                # break the loop when you find a marker
                # and change `text` according to it,
                # so that the new loop will get started with changed `text`
                break
        # if no marker was found in `text`, add the whole `text` to `splited`
        if not _has_marker:
            splited.append(text)
        else:
            iter_text(text)

    iter_text(text)

    print 'final splited = ', splited

for character in text:

def better_split(text,markersString):
    # simple and better way for 'for e in markerString...'
    markers = list(markersString)
    splited = []

    # there is no need to assign variable n, we all know it should be 1
    # n = 1    

    def iter_text(text):
        # check if text is an empty string,
        # NOTE this `text` will cover `text` in upper function as to local scope,
        # so it's actually the text everytime iter_text() get,
        # not the one better_split() get.
        if not text:
            return
        # [UPDATES 2012-03-17 01:07 EST]
        # add a flag to judge if there are markers in `text`
        _has_marker = False
        for character in text:
            if character in markers:
                # set `_has_marker` to True to indicate `text` has been handled
                _has_marker = True
                point = text.find(character)
                word = text[:point]
                splited.append(word)
                # check if text[point] is legal, to prevent raising of IndexError
                while point + 1 <= len(text) and text[point] in markers:
                    point = point + 1
                text = text[point:]
                # break the loop when you find a marker
                # and change `text` according to it,
                # so that the new loop will get started with changed `text`
                break
        # if no marker was found in `text`, add the whole `text` to `splited`
        if not _has_marker:
            splited.append(text)
        else:
            iter_text(text)

    iter_text(text)

    print 'final splited = ', splited

better\u split（）

yourmodule.split（）

split（）函数区分开来
您可以使用以下方法实现它：

如果不允许使用映射
，过滤器
，则可以轻松替换它们：

“|”。连接（映射（关于转义，分隔符））
：

过滤器（无，重新拆分（文本））
：

更好\u split（）
不是一个好名字。如何“更好”，以什么方式
yourmodule.split（）
足以将其与任何其他split（）函数区分开来
您可以使用以下方法实现它：

如果不允许使用映射
，过滤器
，则可以轻松替换它们：

“|”。连接（映射（关于转义，分隔符））
：

过滤器（无，重新拆分（文本））
：

def spli（str，sep=''）：
索引=0
字符串=“”
列表=[]
而索引defspli（str，sep=''）：
索引=0
字符串=“”
列表=[]
而IndexHanks，Tadeck，为这个漂亮的代码示例！
"|".join(re.escape(s) for s in separators)

[s for s in re_sep.split(text) if s]

def spli(str,sep=' '):
    index=0
    string=''
    list=[]
    while index<len(str):
       if(str[index] not in sep):
          string+=str[index]
       elif(str[index] in sep):
          list.append(string)
          string=''
       index+=1
    if string:list.append(string)
        return(list)
n='hello'
print(spli(n))

output:
 ['h','e','l','l','o']