Python 如何删除尾随换行符？_Python_Newline_Trailing

Python 如何删除尾随换行符？

python

Python 如何删除尾随换行符？,python,newline,trailing,Python,Newline,Trailing,Perl的chomp函数的Python等价物是什么？如果字符串是换行符，它会删除字符串的最后一个字符。尝试方法rstrip（）（参见文档和文档） Python的rstrip（）要仅删除换行符，请执行以下操作： >>> 'test string \n \r\n\n\r \n\n'.rstrip('\n') 'test string \n \r\n\n\r ' >>> 'foo\n\n'.rstrip(os.linesep) 'foo' 还有方法strip（

Perl的

chomp

函数的Python等价物是什么？如果字符串是换行符，它会删除字符串的最后一个字符。

尝试方法

rstrip（）

（参见文档和文档）

Python的

rstrip（）
要仅删除换行符，请执行以下操作：
>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '

>>> 'foo\n\n'.rstrip(os.linesep)
'foo'

还有方法strip（）
、lstrip（）
和strip（）
：
去除行尾（EOL）字符的标准方法是使用string rstrip（）方法删除任何尾随字符\r或\n。下面是Mac、Windows和Unix下线字符的示例
>>> 'Mac EOL\r'.rstrip('\r\n')
'Mac EOL'
>>> 'Windows EOL\r\n'.rstrip('\r\n')
'Windows EOL'
>>> 'Unix EOL\n'.rstrip('\r\n')
'Unix EOL'

使用'\r\n'作为rstrip的参数意味着它将删除'\r'或'\n'的任何尾随组合。这就是为什么它在上述三种情况下都有效
这种细微差别在极少数情况下很重要。例如，我曾经不得不处理一个包含HL7消息的文本文件。HL7标准要求尾随“\r”作为其下线字符。我在其上使用此消息的Windows计算机附加了自己的“\r\n”EOL字符。因此，每行的结尾看起来像“\r\r\n”。使用rstrip（'\r\n'）会删除整个'\r\r\n'，这不是我想要的。在这种情况下，我只是将最后两个字符切掉
请注意，与Perl的chomp
函数不同，这将删除字符串末尾的所有指定字符，而不仅仅是一个：
>>> "Hello\n\n\n".rstrip("\n")
"Hello"

我要说的是，不带尾随换行符的行的“pythonic”方法是splitlines（）
请注意，rstrip的行为与Perl的chomp（）不完全相同，因为它不修改字符串。也就是说，在Perl中：
$x="a\n";

chomp $x

结果，$x
为“a”

但在Python中：
x="a\n"

x.rstrip()

将意味着x
的值仍然是a\n

。即使是
x=x.rstrip（）
也不总是给出相同的结果，因为它会从字符串的末尾去掉所有空格，最多不只是一个换行符。
我不使用Python编程，但我在Python.org上遇到了一个支持Python 2.2或更高版本的S.rstrip（“\r\n”）。
我可能会使用这样的东西：

import os s = s.rstrip(os.linesep)

>>> re.sub(os.linesep + r'\Z','','foo\n\n') 'foo\n'

我认为
rstrip（“\n”）
的问题在于，您可能需要确保行分隔符是可移植的。（谣传一些过时的系统使用
“\r\n”
）。另一个问题是
rstrip
将去掉重复的空白。希望
os.linesep
将包含正确的字符。以上这些对我来说很有用。
rstrip在很多层面上都与chomp不一样。阅读并了解chomp确实非常复杂
然而，我的主要观点是chomp最多删除1行结尾，而rstrip将尽可能多地删除
在这里，您可以看到rstrip正在删除所有换行符：

>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n') 'test string \n \r\n\n\r '

>>> 'foo\n\n'.rstrip(os.linesep) 'foo'
使用re.sub可以更接近典型的Perl chomp用法，如下所示：

import os s = s.rstrip(os.linesep)

>>> re.sub(os.linesep + r'\Z','','foo\n\n') 'foo\n'
小心使用
“foo”.rstrip（os.linesep）
：这只会为执行Python的平台选择换行符。假设您在Linux下运行Windows文件的行，例如：

$ python Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48) [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os, sys >>> sys.platform 'linux2' >>> "foo\r\n".rstrip(os.linesep) 'foo\r' >>>

使用
“foo”.rstrip（“\r\n”）
，如Mike所述。
特殊情况下的解决方案：
如果换行符是最后一个字符（与大多数文件输入的情况相同），则对于集合中的任何元素，都可以按如下方式进行索引：

foobar= foobar[:-1]
将换行符切掉

"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '') >>> 'line 1line 2...'
或者你也可以用regexps变得更古怪：）

玩得开心
您可以使用
line=line.rstrip（'\n'）
。这将从字符串末尾删除所有换行符，而不仅仅是一行。
如果您的问题是清除多行str对象（oldstr）中的所有换行符，则可以根据分隔符“\n”将其拆分为列表，然后将此列表合并为新str（newstr）

newstr=“”.join（oldstr.split（'\n'））
An只需使用
line.strip（）
Perl的
chomp
函数仅当一个换行符序列实际存在时才从字符串的末尾删除该序列
如果
process
在概念上是我需要的函数，以便对该文件中的每一行执行一些有用的操作，那么我计划如何在Python中执行此操作：

import os sep_pos = -len(os.linesep) with open("file.txt") as f: for line in f: if line[sep_pos:] == os.linesep: line = line[:sep_pos] process(line)
一概而论：

line = line.rstrip('\r|\n')
您可以使用strip：

line = line.strip()
演示：

我发现能够通过迭代器获得选中的行非常方便，就像从文件对象获得未选中的行一样。可以使用以下代码执行此操作：

def chomped_lines(it): return map(operator.methodcaller('rstrip', '\r\n'), it)
示例用法：

with open("file.txt") as infile: for line in chomped_lines(infile): process(line)

将删除字符串
s
末尾的所有换行符。需要赋值，因为
rstrip
返回一个新字符串，而不是修改原始字符串
这将完全复制perl对“\n”行终止符的chomp（数组上的负行为）：

def chomp(x): if x.endswith("\r\n"): return x[:-2] if x.endswith("\n") or x.endswith("\r"): return x[:-1] return x

（注意：它不会“就地”修改字符串；它不会去除额外的尾随空格；考虑到\r\n）
如果您关心速度（比如您有一个长长的字符串列表），并且您知道换行符的性质，那么字符串切片实际上比rstrip快。一个小测试来说明这一点：

import time loops = 50000000 def method1(loops=loops): test_string = 'num\n' t0 = time.time() for num in xrange(loops): out_sting = test_string[:-1] t1 = time.time() print('Method 1: ' + str(t1 - t0)) def method2(loops=loops): test_string = 'num\n' t0 = time.time() for num in xrange(loops): out_sting = test_string.rstrip() t1 = time.time() print('Method 2: ' + str(t1 - t0)) method1() method2()
输出：

Method 1: 3.92700004578 Method 2: 6.73000001907
只需使用：

line = line.rstrip("\n")
或

你不需要这些复杂的东西，我们通常会遇到三种类型的行尾：
\n
，
\r
和
\r\n
。中的一个相当简单的正则表达式，即
r”\r？\n？$“
，能够捕获所有这些表达式

>>> ' spacious '.rstrip() ' spacious' >>> "AABAA".rstrip("A") 'AAB' >>> "ABBA".rstrip("AB") # both AB and BA are stripped '' >>> "ABCABBA".rstrip("AB") 'ABC'
（我们必须抓住他们，对吗？）
在最后一个参数中，我们将替换的发生次数限制为一次，在某种程度上模仿chomp。例如：

import re text_1 = "hellothere\n\n\n" text_2 = "hellothere\n\n\r" text_3 = "hellothere\n\n\r\n" a = re.sub(r"\r?\n?$", "", text_1, 1) b = re.sub(r"\r?\n?$", "", text_2, 1) c = re.sub(r"\r?\n?$", "", text_3, 1)

。。。其中
a==b==c
为
True
似乎没有一个完美的perl模拟。特别是，无法处理多字符换行符分隔符，如
\r\n
。然而，它确实如此。
line = line.rstrip("\n")

line = line.strip("\n")

>>> ' spacious '.rstrip() ' spacious' >>> "AABAA".rstrip("A") 'AAB' >>> "ABBA".rstrip("AB") # both AB and BA are stripped '' >>> "ABCABBA".rstrip("AB") 'ABC'

import re re.sub(r"\r?\n?$", "", the_text, 1)

import re text_1 = "hellothere\n\n\n" text_2 = "hellothere\n\n\r" text_3 = "hellothere\n\n\r\n" a = re.sub(r"\r?\n?$", "", text_1, 1) b = re.sub(r"\r?\n?$", "", text_2, 1) c = re.sub(r"\r?\n?$", "", text_3, 1)

''.join(s.splitlines())

def chomp(s): if len(s): lines = s.splitlines(True) last = lines.pop() return ''.join(lines + last.splitlines()) else: return ''

>>> import re

>>> re.sub(r'[\n\r]+$', '', '\nx\r\n') '\nx'

>>> re.sub(r'[\n\r]+', '', '\nx\r\n') 'x'

>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n') '\nx\r' >>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r') '\nx\r' >>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n') '\nx'

>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1) '\nx\n' >>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1) '\nx\r\n' >>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1) '\nx' >>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1) '\nx'

import re if re.search("(\\r|)\\n$", line): line = re.sub("(\\r|)\\n$", "", line)

s = '''Hello World \t\n\r\tHi There''' # import the module string import string # use the method translate to convert s.translate({ord(c): None for c in string.whitespace} >>'HelloWorldHiThere'

s = ''' Hello World \t\n\r\tHi ''' print(re.sub(r"\s+", "", s), sep='') # \s matches all white spaces >HelloWorldHi

s.replace('\n', '').replace('\t','').replace('\r','') >' Hello World Hi '

s = '''Hello World \t\n\r\tHi There''' regex = re.compile(r'[\n\r\t]') regex.sub("", s) >'Hello World Hi There'

s = '''Hello World \t\n\r\tHi There''' ' '.join(s.split()) >'Hello World Hi There'