Python 如何使用Unicode编码创建临时文件？_Python_Unicode_Temporary Files

Python 如何使用Unicode编码创建临时文件？

python unicode

Python 如何使用Unicode编码创建临时文件？,python,unicode,temporary-files,Python,Unicode,Temporary Files,当我使用open（）打开文件时，我无法编写unicode字符串。我了解到我需要使用编解码器并使用Unicode编码打开文件（请参阅）现在我需要创建一些临时文件。我试图使用tempfile库，但它没有任何编码选项。当我尝试使用tempfile在临时文件中写入任何unicode字符串时，失败： #!/usr/bin/python2.6 # -*- coding: utf-8 -*- import tempfile with tempfile.TemporaryFile() as fh: fh.

当我使用

open（）

打开文件时，我无法编写unicode字符串。我了解到我需要使用

编解码器

并使用Unicode编码打开文件（请参阅）

现在我需要创建一些临时文件。我试图使用

tempfile

库，但它没有任何编码选项。当我尝试使用

tempfile

在临时文件中写入任何unicode字符串时，失败：

#!/usr/bin/python2.6
# -*- coding: utf-8 -*-
import tempfile
with tempfile.TemporaryFile() as fh:
  fh.write(u"Hello World: ä")
  fh.seek(0)
  for line in fh:
    print line

如何使用Python中的Unicode编码创建临时文件

编辑：

我正在使用Linux，我收到的错误消息是：

Traceback (most recent call last):
  File "tmp_file.py", line 5, in <module>
    fh.write(u"Hello World: ä")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 13: ordinal not in range(128)

回溯（最近一次呼叫最后一次）：
文件“tmp_File.py”，第5行，在
fh.书写（u“你好，世界：ä”）
UnicodeEncodeError:“ascii”编解码器无法对位置13中的字符u'\xe4'进行编码：序号不在范围内（128）

这只是一个例子。实际上，我正在尝试编写某个API返回的字符串

我想出了一个解决方案：创建一个临时文件，它不会使用

tempfile

自动删除，关闭它，然后使用

codecs

再次打开它：

#!/usr/bin/python2.6
# -*- coding: utf-8 -*-

import codecs
import os
import tempfile

f = tempfile.NamedTemporaryFile(delete=False)
filename = f.name
f.close()

with codecs.open(filename, 'w+b', encoding='utf-8') as fh:
  fh.write(u"Hello World: ä")
  fh.seek(0)
  for line in fh:
    print line

os.unlink(filename)

删除u使您的代码为我工作：

fh.write("Hello World: ä")

我想这是因为它已经是unicode了。

您正在尝试将unicode对象

（u“…）

写入临时文件，您应该在其中使用编码字符串

（“…）

）。您不必显式地传递

“encode=”

参数，因为您已经在第二行

（“#-*-编码：utf-8-*-”

中说明了编码。只要使用

fh.write（“ä”）

而不是

fh.write（u“ä”）

，你就会没事。

其他人的答案都是正确的，我只想澄清一下发生了什么：

literal

'foo'

和literal

u'foo'

之间的区别在于前者是一个字节字符串，后者是Unicode对象

首先，了解Unicode是字符集。UTF-8是编码。Unicode对象是关于前者的，它是Unicode字符串，不一定是UTF-8字符串。在您的例子中，字符串文字的编码将是UTF-8，因为您在文件的第一行中指定了它

要从字节字符串中获取Unicode字符串，请调用

.encode（）

方法：

>>>> u"ひらがな".encode("utf-8") == "ひらがな"
True

类似地，您可以在

write

调用中调用string.encode，获得与删除

相同的效果

如果您没有在顶部指定编码，比如说，如果您正在从另一个文件读取Unicode数据，那么您将指定它在到达Python字符串之前所使用的编码。这将决定它将如何以字节表示（即

str

类型）

因此，您得到的错误只是因为

tempfile

模块需要一个

str

对象。这并不意味着它不能处理unicode，只是它希望您传入字节字符串而不是unicode对象，因为如果不指定编码，它就不知道如何将其写入临时文件。

tempfile.TemporaryFile具有：

请注意，现在需要指定mode='w+'而不是默认的二进制模式。还要注意的是，在Python3中，字符串文本是隐式Unicode的，没有u修饰符

如果您一直使用的是二进制，并且在将Unicode字符串写入文件之前需要对其进行编码：

#!/usr/bin/python
# -*- coding: utf-8 -*-
import tempfile
with tempfile.TemporaryFile() as fh:
  fh.write(u"Hello World: ä".encode('utf-8'))
  fh.seek(0)
  for line in fh:
    print line.decode('utf-8')

Unicode指定字符集，而不是编码，因此无论哪种情况，您都需要一种方法来指定如何对Unicode字符进行编码

由于我正在开发一个包含临时文件对象的Python程序，该对象应该同时在Python2和Python3中运行，因此我发现，像其他答案所建议的那样，手动编码所有以UTF-8编写的字符串并不令人满意

相反，我编写了以下小型polyfill（因为我在six中找不到类似的东西），将类似二进制文件的对象包装成类似UTF-8文件的对象：

from __future__ import unicode_literals
import sys
import codecs
if sys.hexversion < 0x03000000:
    def uwriter(fp):
        return codecs.getwriter('utf-8')(fp)
else:
    def uwriter(fp):
        return fp

将sys设置为UTF-8的默认编码将解决编码问题

import sys
reload(sys)
sys.setdefaultencoding('utf-8') #set to utf-8 by default this will solve the errors

import tempfile
with tempfile.TemporaryFile() as fh:
  fh.write(u"Hello World: ä")
  fh.seek(0)
  for line in fh:
    print line

是的，在没有u的linux机器上运行脚本会产生正确的输出

Hello World:ä

Yes，这很有效。。。实际上，在我的实际程序中，我从某个API获取输入，但它失败了，所以这不是因为我的代码中有“u”。@john:删除u可能并不是做你认为的事情，即使你在最终文件中得到了正确的utf-8。如果在键入“ä”时使用了某些utf-8编辑器，则很可能在字符串中存储了两个字节。这很容易检查。如果是这样，len（“Hello World:ä”）将是15，“Hello World:ä”[14]将是“\xa4”。是的。因此，不需要使用一些神奇的unicode选项打开tempfile，只需编写一个显式编码的字符串：

fh.write（u'föo bār'.encode（'utf-8'））

。如果大多数字符都是CJK，请将“utf-8”替换为“utf-16”。@9000:如果使用“utf-16”，请注意此方法。如果这样做，您必须立即写入整个文件，因为encode（'utf-16'）也会输出文件BOM。如果有多个字符串要写入同一文件，则第一个字符串应使用.encode（'utf-16'），后面的字符串应使用.encode（'utf-16-le'），不会发送BOM。使用一些神奇的unicode选项可以避免这个陷阱。

“abc”

是Python 3中的一个unicode字符串，或者存在来自未来的导入unicode文本的

。很抱歉，这是次优。参见@spinning_plate的答案和我对它的评论；事情要简单得多。@9000我在这里看不到spinning\u plate
的答案。@guettli:一定是某种打字错误；我一定是指dfb
的答案，这是目前公认的答案。是的，这是可行的，但实际上我正在尝试编写某个API返回的字符串，因此我的代码中没有（u“…）
。我已经用这些信息更新了我的问题。我用两个文件尝试了一个示例，并且fh.write（other_file.f（））可以工作
# encoding: utf-8
from tempfile import NamedTemporaryFile
with uwriter(NamedTemporaryFile(suffix='.txt', mode='w')) as fp:
    fp.write('Hællo wörld!\n')

import sys
reload(sys)
sys.setdefaultencoding('utf-8') #set to utf-8 by default this will solve the errors

import tempfile
with tempfile.TemporaryFile() as fh:
  fh.write(u"Hello World: ä")
  fh.seek(0)
  for line in fh:
    print line