Windows csv文件中字符串的表示不正确_Windows_Python 2.7_Csv_Unicode

Windows csv文件中字符串的表示不正确

windows python-2.7 csv unicode

Windows csv文件中字符串的表示不正确,windows,python-2.7,csv,unicode,Windows,Python 2.7,Csv,Unicode,我在Win7上，Python2.7 拿着绳子。原始视图： A.p.Møller Mærsk UTF-8： s = 'A. P. M\xc3\xb8ller M\xc3\xa6rsk' 我需要用csv来写。试试这个： with open('14.09 Anbefalte aksjer.csv', 'w') as csvfile: writer = csv.writer(csvfile) writer.writerow([s]) 收到这个： A.p.MΓёller MΓ阿尔斯

我在Win7上，Python2.7

拿着绳子。 原始视图：

A.p.Møller Mærsk

UTF-8：

s = 'A. P. M\xc3\xb8ller M\xc3\xa6rsk'

我需要用csv来写。试试这个：

with open('14.09 Anbefalte aksjer.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow([s])

收到这个：

A.p.MΓёller MΓ阿尔斯克

尝试使用Unicode编写器：

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        # Redirect output to a queue
        self.queue = StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

s = 'A. P. M\xc3\xb8ller M\xc3\xa6rsk'.decode('utf8')
with open('14.09 Anbefalte aksjer.csv', 'w') as csvfile:
    writer = UnicodeWriter(csvfile)
    writer.writerow([s])

又得到了：

A.p.MΓёller MΓ阿尔斯克

请尝试以下操作：

再次：

A.p.MΓёller MΓ阿尔斯克

怎么了？我怎样才能把它写对呢？

您看到的是一个mojibake：表示在一种字符编码中编码的Unicode文本的字节在另一种（不兼容的）字符编码中显示

如果

'.decode（'utf8'）

没有引发

AttributeError

，则表示您不在Python 3上（不管您的问题是什么）。在Python 2上，

csv

不直接支持Unicode，您必须手动编码：

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

text = "A. P. Møller Mærsk"
with open('14.09 Anbefalte aksjer.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow([text.encode('utf-8')])

如果

text

包含未损坏的数据，则

UnicodeWriter

和

unicodesv

模块应该也可以工作。

Windows使用记事本或Excel等工具假定默认窗口区域设置的编码，因此对于UTF-8，必须在文件开头编码字节顺序标记（BOM，U+FEFF）。Python为此提供了一种编码，

utf-8-sig

。注意：通过使用

#coding:utf8

并将源文件保存在UTF-8中，您可以直接将字符串声明为Unicode字符串。最后，与

csv

模块一起使用的文件应该在Python2.7上以

wb

的形式打开，否则您将看到在Windows上编写换行符时出现问题

#coding:utf8
import csv
from StringIO import StringIO
import codecs

class UnicodeWriter:
    """
    A CSV writer which will write rows to CSV file "f",
    which is encoded in the given encoding.
    """

    # Use utf-8-sig encoding here.
    def __init__(self, f, dialect=csv.excel, encoding="utf-8-sig", **kwds):
        # Redirect output to a queue
        self.queue = StringIO()
        self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
        self.stream = f
        self.encoder = codecs.getincrementalencoder(encoding)()

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])
        # Fetch UTF-8 output from the queue ...
        data = self.queue.getvalue()
        data = data.decode("utf-8")
        # ... and reencode it into the target encoding
        data = self.encoder.encode(data)
        # write to the target stream
        self.stream.write(data)
        # empty queue
        self.queue.truncate(0)

    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

s = u'A. P. Møller Mærsk' # declare as Unicode string.
with open('14.09 Anbefalte aksjer.csv', 'wb') as csvfile:
    writer = UnicodeWriter(csvfile)
    writer.writerow([s])

输出：

A. P. Møller Mærsk

你用什么打开你的CSV文件？当您通过简单的双击打开文件时，Excel在处理Unicode/UTF-8编码的CSV数据方面非常糟糕。试试程序的导入对话框，我想你可以选择应该采用什么字符编码。如果这不起作用，请尝试LibreOffice/OpenOffice的Calc，它也提供了选项。或者至少用记事本++打开它，看看编码是否正确。谢谢你，CBroe。通过更改excel编码，我做到了这一点。