Python 区分Windows和Linux行尾字符的正则表达式_Python_Regex_Python 2.7_Regex Negation_Regex Lookarounds

Python 区分Windows和Linux行尾字符的正则表达式

python regex python-2.7

Python 区分Windows和Linux行尾字符的正则表达式,python,regex,python-2.7,regex-negation,regex-lookarounds,Python,Regex,Python 2.7,Regex Negation,Regex Lookarounds,我试图区分Linux/Unix行尾字符\n和Windows行尾字符\r\n。我似乎找不到唯一的正则表达式字符串来区分这两种情况。我的代码是 import regex winpattern = regex.compile("[(?m)[\r][\n]$",regex.DEBUG|regex.MULTILINE) linuxpattern = regex.compile("^*.[^\r][\n]$", regex.DEBUG) for i, line in enumerate(open('

我试图区分Linux/Unix行尾字符

\n

和Windows行尾字符

\r\n

。我似乎找不到唯一的正则表达式字符串来区分这两种情况。我的代码是

import regex 

winpattern = regex.compile("[(?m)[\r][\n]$",regex.DEBUG|regex.MULTILINE)

linuxpattern = regex.compile("^*.[^\r][\n]$", regex.DEBUG)

for i, line in enumerate(open('file8.py')):
    for match in regex.finditer(linuxpattern, line):
        print 'Found on line %s: %s' % (i+1, match.groups())

winpattern

和

linuxpattern

同时匹配Windows和Linux。我希望

linuxpattern

只匹配Linux EOL，而

winpattern

只匹配Windows EOL。有什么建议吗

当以文本文件形式打开文件时，Python默认使用通用换行模式（请参阅），这意味着它将所有三种换行类型

\r\n

、

\r

和

\n

转换为

\n

。这意味着您的正则表达式是不相关的：您在读取文件时已经丢失了有关换行符类型的信息

要禁用换行符转换，您应该将

换行符=''

参数传递给（用于python）[（？m）[\r][\n]$表示要匹配集合中的任何字符

[？（）m\r

然后

\n

，然后是行尾。

$ echo 'Hello
> World
> ' > test.unix
$ cp test.unix test.dos
$ unix2dos test.dos
unix2dos: converting file test.dos to DOS format...
$ python3
Python 3.5.3 (default, Nov 23 2017, 11:34:05) 
[GCC 6.3.0 20170406] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> unix = open('test.unix', newline='').read()
>>> dos = open('test.dos', newline='').read()
>>> unix
'Hello\nWorld\n\n'
>>> dos
'Hello\r\nWorld\r\n\r\n'

>>> import re
>>> winregex = re.compile(r'\r\n')
>>> unixregex = re.compile(r'[^\r]\n')
>>> winregex.findall(unix)
[]
>>> winregex.findall(dos)
['\r\n', '\r\n', '\r\n']
>>> unixregex.findall(unix)
['o\n', 'd\n']
>>> unixregex.findall(dos)
[]

>>> unix_lines = re.compile(r'^(.*[^\r\n]\n|\n)', re.MULTILINE)
>>> dos_lines = re.compile(r'^.*\r\n', re.MULTILINE)
>>> unix_lines.findall(dos)
[]
>>> unix_lines.findall(unix)
['Hello\n', 'World\n', '\n']
>>> dos_lines.findall(unix)
[]
>>> dos_lines.findall(dos)
['Hello\r\n', 'World\r\n', '\r\n']