Python 删除列表中单词末尾的\n和以下字母_Python_String

Python 删除列表中单词末尾的\n和以下字母

python string

Python 删除列表中单词末尾的\n和以下字母,python,string,Python,String,如何删除\n和以下字母？非常感谢 wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki'] for x in wordlist: ...? 通过re.sub完成： >>> help(re.sub) 1 Help on function sub in module re: 2 3 sub(pattern, repl, string, coun

如何删除

\n

和以下字母？非常感谢

wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']
for x in wordlist:
    ...?

通过

re.sub

完成：

>>> help(re.sub)
  1 Help on function sub in module re:
  2 
  3 sub(pattern, repl, string, count=0)
  4     Return the string obtained by replacing the leftmost
  5     non-overlapping occurrences of the pattern in string by the
  6     replacement repl.  repl can be either a string or a callable;
  7     if a callable, it's passed the match object and must return
  8     a replacement string to be used.

可以使用正则表达式执行此操作：

import re
wordlist = [re.sub("\n.*", "", word) for word in wordlist]

正则表达式

\n.*

匹配第一个

\n

和后面的任何内容（

），并将其替换为零

[w[:w.find('\n')] fow w in wordlist]

少数测试：

$ python -m timeit -s "wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[w[:w.find('\n')] for w in wordlist]"
100000 loops, best of 3: 2.03 usec per loop
$ python -m timeit -s "import re; wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[re.sub('\n.*', '', w) for w in wordlist]"
10000 loops, best of 3: 17.5 usec per loop
$ python -m timeit -s "import re; RE = re.compile('\n.*'); wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[RE.sub('', w) for w in wordlist]"
100000 loops, best of 3: 6.76 usec per loop

编辑：

上述解决方案是完全错误的（参见Peter Hansen的评论）。下面是更正的一个：

def truncate(words, s):
    for w in words:
        i = w.find(s)
        yield w[:i] if i != -1 else w

我不确定，但我猜他也想删除\n之后的连续字符，但我也想删除\n和“”之后的以下字母！谢谢你，我很高兴你能帮助我。python中有很多函数。你帮了大忙。当然，我现在将阅读更多关于re模块的内容。：）这是一个非常糟糕（即完全未经测试）的答案，因为它悄悄地截断了没有换行符的单词。str.find（）在不匹配的情况下返回-1，使用[：-1]进行切片将返回所有字符，但不包括最后一个字符。请删除。@Peter Hansen:谢谢你的报告，我正在考虑如何让它一行一行，但我忘记了正确性。@mg，好的。。。现在请修复编辑部分中的for循环。“For w in in in words:”有一个额外的“in”。通过使用

RE=RE.compile（…）.sub

和

[RE（“”，w）…]

：无需为每个单词寻找

sub（）

方法，您可以获得一个小的加速（~10%）。

def truncate(words, s):
    for w in words:
        i = w.find(s)
        yield w[:i] if i != -1 else w

>>> wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']
>>> [ i.split("\n")[0] for i in wordlist ]
['Schreiben', 'Schreiben', 'Schreiben', 'Schreiben']