Python:UTF16解码在Windows框上添加了一个新的空行_Python_Utf 16

Python:UTF16解码在Windows框上添加了一个新的空行

python

Python:UTF16解码在Windows框上添加了一个新的空行,python,utf-16,Python,Utf 16,我在windows和*nix平台上遇到了一个额外的换行代码问题 file = open('UTF16file.xml', 'rb') html = file.read().decode('utf-16') file.close() regexp = re.compile(self.originalurl, re.S) (html, changes) = regexp.subn(self.newurl, html) file = open('UTF16file-regexed.xml', 'w

我在windows和*nix平台上遇到了一个额外的换行代码问题

file = open('UTF16file.xml', 'rb')
html = file.read().decode('utf-16')
file.close()

regexp = re.compile(self.originalurl, re.S)
(html, changes) = regexp.subn(self.newurl, html)

file = open('UTF16file-regexed.xml', 'w+')
file.write(html.encode('utf-16'))
file.close()

在我的mac电脑上运行这段代码很有效——我可以在没有额外换行符的情况下取回我的文件。到目前为止，我已经尝试：

将正则表达式编码为utf-16，而不是在Windows和OSX上解码文件中断

以“wb”模式而不是“w+”模式写入-在Windows上中断

有什么想法吗

C:\Documents and Settings\Nick>python
ActivePython 2.6.4.10 (ActiveState Software Inc.) based on
Python 2.6.4 (r264:75706, Jan 22 2010, 16:41:54) [MSC v.1500 32 bit (Intel)]...
Type "help", "copyright", "credits" or "license" for more information.
>>> txt = """here
... is all
... my text n stuff."""
>>> f = open('u16.txt','wb')
>>> f.write(txt.encode('utf-16'))
>>> f.close()
>>> exit()

C:\Documents and Settings\Nick>notepad u16.txt

看起来像：

here is allmy text n stuff.

here 
is all
my text n stuff.

（虽然当我将它从记事本复制粘贴到FF时，它实际上会换行）…但是：

C:\Documents and Settings\Nick>
    "C:\Program Files\Windows NT\Accessories\wordpad.exe" u16.txt

看起来像：

here is allmy text n stuff.

here 
is all
my text n stuff.

（在Windows XP SP3 32位上）

选项2听起来像是正确的选项。什么中断？这两个文件是相同的（二进制？），还是您的MacOS文本编辑器为您修复了双换行符？对于#2，您是否尝试过用写字板（或记事本++等）而不是记事本打开文件？大多数比

notepad.exe更高级的文本编辑器将正确解释Linux新行。只是一个小小的吹毛求疵：你不应该用名为file
的变量隐藏文件
类型。如果你想了解更多关于新行的信息，请阅读克里斯蒂安：谢谢你的链接和建议。