Python 从字符串中删除所有特殊字符和标点符号，并将其限制为前200个字符_Python_String

Python 从字符串中删除所有特殊字符和标点符号，并将其限制为前200个字符

python string

Python 从字符串中删除所有特殊字符和标点符号，并将其限制为前200个字符,python,string,Python,String,Hi需要删除字符串中的所有特殊字符、标点符号和空格，这样我就只有字母和数字了。最后一个字符串的长度应仅为前200个字符我知道一个解决方案是：- string = "Special $#! character's spaces 888323" string = ''.join(e for e in string if e.isalnum())[:200] 但这将首先删除所有不需要的字符，然后对其进行切片。有没有像发电机一样工作的东西（只要总字符数达到200个，它就会坏掉）。我想要一个肾

Hi需要删除字符串中的所有特殊字符、标点符号和空格，这样我就只有字母和数字了。最后一个字符串的长度应仅为前200个字符

我知道一个解决方案是：-

string = "Special $#! character's   spaces 888323"

string = ''.join(e for e in string if e.isalnum())[:200]

但这将首先删除所有不需要的字符，然后对其进行切片。有没有像发电机一样工作的东西（只要总字符数达到200个，它就会坏掉）。我想要一个肾盂液。PS：我知道我可以通过FOR循环实现它

from itertools import islice
"".join(islice((e for e in string if e.isalnum()), 200))

但就我个人而言，我觉得for循环听起来好多了

但就我个人而言，我认为for循环听起来好多了。

将生成器表达式或函数用于：

请注意，如果字符串不是很大，并且与字符串长度相比，数字

（此处为200）不小，则应使用简单切片，因为与基于Python的for循环相比，它将非常快：

>>> from string import whitespace, punctuation
>>> s.translate(None, whitespace+punctuation)[:10]
'Specialcha'

大型字符串的一些计时比较：

>>> s = "Special $#! character's   spaces 888323" * 10000
>>> len(s)
390000
# For very small n
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 200))
10000 loops, best of 3: 20.2 µs per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:200]
1000 loops, best of 3: 383 µs per loop

# For mid-sized n
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 10000))
1000 loops, best of 3: 930 µs per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:10000]
1000 loops, best of 3: 378 µs per loop

# When n is comparable to length of string.
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 100000))
100 loops, best of 3: 9.41 ms per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:100000]
1000 loops, best of 3: 385 µs per loop

将生成器表达式或函数用于：

请注意，如果字符串不是很大，并且与字符串长度相比，数字

（此处为200）不小，则应使用简单切片，因为与基于Python的for循环相比，它将非常快：

>>> from string import whitespace, punctuation
>>> s.translate(None, whitespace+punctuation)[:10]
'Specialcha'

大型字符串的一些计时比较：

>>> s = "Special $#! character's   spaces 888323" * 10000
>>> len(s)
390000
# For very small n
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 200))
10000 loops, best of 3: 20.2 µs per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:200]
1000 loops, best of 3: 383 µs per loop

# For mid-sized n
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 10000))
1000 loops, best of 3: 930 µs per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:10000]
1000 loops, best of 3: 378 µs per loop

# When n is comparable to length of string.
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 100000))
100 loops, best of 3: 9.41 ms per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:100000]
1000 loops, best of 3: 385 µs per loop

将生成器表达式或函数用于：

请注意，如果字符串不是很大，并且与字符串长度相比，数字

（此处为200）不小，则应使用简单切片，因为与基于Python的for循环相比，它将非常快：

>>> from string import whitespace, punctuation
>>> s.translate(None, whitespace+punctuation)[:10]
'Specialcha'

大型字符串的一些计时比较：

>>> s = "Special $#! character's   spaces 888323" * 10000
>>> len(s)
390000
# For very small n
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 200))
10000 loops, best of 3: 20.2 µs per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:200]
1000 loops, best of 3: 383 µs per loop

# For mid-sized n
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 10000))
1000 loops, best of 3: 930 µs per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:10000]
1000 loops, best of 3: 378 µs per loop

# When n is comparable to length of string.
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 100000))
100 loops, best of 3: 9.41 ms per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:100000]
1000 loops, best of 3: 385 µs per loop

将生成器表达式或函数用于：

请注意，如果字符串不是很大，并且与字符串长度相比，数字

（此处为200）不小，则应使用简单切片，因为与基于Python的for循环相比，它将非常快：

>>> from string import whitespace, punctuation
>>> s.translate(None, whitespace+punctuation)[:10]
'Specialcha'

大型字符串的一些计时比较：

>>> s = "Special $#! character's   spaces 888323" * 10000
>>> len(s)
390000
# For very small n
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 200))
10000 loops, best of 3: 20.2 µs per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:200]
1000 loops, best of 3: 383 µs per loop

# For mid-sized n
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 10000))
1000 loops, best of 3: 930 µs per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:10000]
1000 loops, best of 3: 378 µs per loop

# When n is comparable to length of string.
>>> %timeit ''.join(islice((e for e in s if e.isalnum()), 100000))
100 loops, best of 3: 9.41 ms per loop
>>> %timeit s.translate(None, whitespace+punctuation)[:100000]
1000 loops, best of 3: 385 µs per loop

如果正则表达式不能解决您的问题，那可能是因为您还没有使用足够的正则表达式：-）这里有一条单行线（不考虑导入），将其限制为20个字符（因为您的测试数据与您的规范不匹配）：

虽然从技术上讲，它不是一个生成器，但只要您不必处理真正庞大的字符串，它就可以正常工作

它将避免拆分并重新加入原始解决方案：

''.join(e for e in something)

毫无疑问，正则表达式的处理会有一些成本，但我很难相信这和构建一个临时列表一样高，然后再将其分解为一个字符串。不过，如果你担心，你应该测量，而不是猜测

如果您想要一个实际的生成器，很容易实现一个：

class alphanum(object):
    def __init__(self, s, n):
        self.s = s
        self.n = n
        self.ix = 0

    def __iter__(self):
        return self

    def __next__(self):
        return self.next()

    def next(self):
        if self.n <= 0:
            raise StopIteration()
        while self.ix < len(self.s) and not self.s[self.ix].isalnum():
            self.ix += 1
        if self.ix == len(self.s):
            raise StopIteration()

        self.ix += 1
        self.n -= 1
        return self.s[self.ix-1]

    def remainder(self):
        return ''.join([x for x in self])

for x in alphanum("Special $#! chars", 10):
    print x

print alphanum("Special $#! chars", 10).remainder()

虽然从技术上讲，它不是一个生成器，但只要您不必处理真正庞大的字符串，它就可以正常工作

它将避免拆分并重新加入原始解决方案：

''.join(e for e in something)

如果您想要一个实际的生成器，很容易实现一个：

class alphanum(object):
    def __init__(self, s, n):
        self.s = s
        self.n = n
        self.ix = 0

    def __iter__(self):
        return self

    def __next__(self):
        return self.next()

    def next(self):
        if self.n <= 0:
            raise StopIteration()
        while self.ix < len(self.s) and not self.s[self.ix].isalnum():
            self.ix += 1
        if self.ix == len(self.s):
            raise StopIteration()

        self.ix += 1
        self.n -= 1
        return self.s[self.ix-1]

    def remainder(self):
        return ''.join([x for x in self])

for x in alphanum("Special $#! chars", 10):
    print x

print alphanum("Special $#! chars", 10).remainder()

虽然从技术上讲，它不是一个生成器，但只要您不必处理真正庞大的字符串，它就可以正常工作

它将避免拆分并重新加入原始解决方案：

''.join(e for e in something)

如果您想要一个实际的生成器，很容易实现一个：

class alphanum(object):
    def __init__(self, s, n):
        self.s = s
        self.n = n
        self.ix = 0

    def __iter__(self):
        return self

    def __next__(self):
        return self.next()

    def next(self):
        if self.n <= 0:
            raise StopIteration()
        while self.ix < len(self.s) and not self.s[self.ix].isalnum():
            self.ix += 1
        if self.ix == len(self.s):
            raise StopIteration()

        self.ix += 1
        self.n -= 1
        return self.s[self.ix-1]

    def remainder(self):
        return ''.join([x for x in self])

for x in alphanum("Special $#! chars", 10):
    print x

print alphanum("Special $#! chars", 10).remainder()

虽然从技术上讲，它不是一个生成器，但只要您不必处理真正庞大的字符串，它就可以正常工作

它将避免拆分并重新加入原始解决方案：

''.join(e for e in something)

如果您想要一个实际的生成器，很容易实现一个：

class alphanum(object):
    def __init__(self, s, n):
        self.s = s
        self.n = n
        self.ix = 0

    def __iter__(self):
        return self

    def __next__(self):
        return self.next()

    def next(self):
        if self.n <= 0:
            raise StopIteration()
        while self.ix < len(self.s) and not self.s[self.ix].isalnum():
            self.ix += 1
        if self.ix == len(self.s):
            raise StopIteration()

        self.ix += 1
        self.n -= 1
        return self.s[self.ix-1]

    def remainder(self):
        return ''.join([x for x in self])

for x in alphanum("Special $#! chars", 10):
    print x

print alphanum("Special $#! chars", 10).remainder()

我想这不是发电机。它将首先删除不需要的字符，然后将其切分。@user1162512，不，它不是生成器，但老实说，除非您处理的是多兆字节的字符串，否则它不会有任何区别。我会澄清答案。@user1162512您能提供一个简单的示例以及预期的输出吗？我想这不是一个生成器。它将首先删除不需要的字符，然后将其切分。@user1162512，不，它不是生成器，但老实说，除非您处理的是多兆字节的字符串，否则它不会有任何区别。我会帮你的