Python 在csv dictwriter中将dict值从unicode转换为utf-8(或ascii)
我试图将一些数据打印到csv文件,但unicode正在扼杀我的vibe 我的数据是字典格式的-此处是一个片段:Python 在csv dictwriter中将dict值从unicode转换为utf-8(或ascii),python,csv,unicode,encoding,utf-8,Python,Csv,Unicode,Encoding,Utf 8,我试图将一些数据打印到csv文件,但unicode正在扼杀我的vibe 我的数据是字典格式的-此处是一个片段: {'category': u'Best food blog written by a linguist\xa0', 'runners_up': [], 'winner': [u'shesimmers.com'], 'category_url': 'http://www.chicagoreader.com/chicago/best-food-blog-written-by-a-ling
{'category': u'Best food blog written by a linguist\xa0', 'runners_up': [], 'winner': [u'shesimmers.com'], 'category_url': 'http://www.chicagoreader.com/chicago/best-food-blog-written-by-a-linguist/BestOf?oid=4101663'}
这是我使用DictWriter方法的代码段
data = utf_8_encoder(data)
with open('best_food_n_drink.csv', 'w') as csvfile:
categories = ['category', 'category_url', 'winner', 'runners_up']
writer = csv.DictWriter(csvfile, delimiter =',', fieldnames=categories)
writer.writeheader()
for row in data:
writer.writerow(row)
utf_8_编码器来自我前面定义的函数:
def utf_8_encoder(unicode_csv_data):
for line in unicode_csv_data:
line.encode('utf-8')
return unicode_csv_data
我不断收到错误消息,如“dict”对象没有属性“encode”
。我尝试过放弃编码器函数,在底部的for循环中替换row.values().encode('utf-8')
,但这只是告诉我'list object没有属性'encode'
我也尝试过用
('ascii','ignore')
替换('utf-8')
,但就是想不出来。不确定输出的格式,但这会对字符串进行编码:
def map_to(d):
# iterate over the key/values pairings
for k, v in d.items():
# if v is a list join and encode else just encode as it is a string
d[k] = ",".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")
map_to(data)
with open('best_food_n_drink.csv', 'w') as csvfile:
categories = ['category', 'category_url', 'winner', 'runners_up']
writer = csv.DictWriter(csvfile, fieldnames=categories)
writer.writeheader()
writer.writerow(data)
这将输出如下内容,但对于字符串和列表的混合,我真的不知道它最终会是什么样子:
category,category_url,winner,runners_up
Best food blog written by a linguist ,http://www.chicagoreader.com/chicago/best-food-blog-written-by-a-linguist/BestOf?oid=4101663,shesimmers.com,
现在我们发现,如果dict需要在列表上迭代,实际上您有一个列表,但逻辑仍然相同,我们只需在循环中的每个dict上运行函数:
data = [{'category': u"Best restaurant that's been around forever and is still worth the trip\xa0", 'runners_up': [u'Frontera Grill', u'Chicago Diner ', u'Sabatino\u2019s', u'Twin Anchors'], 'winner': [u'Lula Cafe'], 'category_url': 'http://www.chicagoreader.com/chicago/BestOf?category=1979894&year=2011'},
{'category': u'Best bang for your buck\xa0', 'runners_up': [u'Frasca Pizzeria & Wine Bar', u'Chutney Joe\u2019s', u'"My boyfriend!"'], 'winner': [u'Big Star', u'Sultan\u2019s Market']}]
def map_to(d):
for k, v in d.items():
d[k] = ",".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")
with open('best_food_n_drink.csv', 'w') as csvfile:
categories = ['category', 'category_url', 'winner', 'runners_up']
writer = csv.DictWriter(csvfile, fieldnames=categories)
writer.writeheader()
# get each dict from the list
for d in data:
# run the encode func
map_to(d)
writer.writerow(d)
我假定第二个目录中确实存在“category\u url”
要捕获None并避免编码错误,请在func中添加一行:
def map_to(d):
for k, v in d.items():
# catch None's
if v is not None:
d[k] = " ".join(v).encode("utf-8") if isinstance(v, list) else v.encode("utf-8")
根据您计划对数据执行的操作,将数据存储为json
可能有用:
import json
with open('best_food_n_drink.js', 'w') as js:
json.dump(data,js)
然后,若要获取列表中的数据,请执行以下操作:
import json
with open('best_food_n_drink.json') as js:
data = json.load(js)
使用python 3.4时,请使用:
io.open(filename, 'w', encoding='utf8')
而不是
open(filename, 'w')
为我解决了同样的问题。另一个解决方案是创建全面的方法来检查除
unicode
和list
之外的其他类型,我知道在最初的问题中不是,但任何人都可以在这里尝试转换复杂的dict
(带内部dict,list…),以下是我的贡献:
def array_to_utf(a):
autf = []
i = 0
for v in a:
if isinstance(v, unicode):
autf.append(v.encode('utf-8'))
elif isinstance(v, dict):
autf.append(dict_to_utf(v))
elif isinstance(v, list):
autf.append(array_to_utf(v))
else:
autf.append(v)
return autf
def dict_to_utf(d):
dutf = {}
for k,v in d.iteritems():
if isinstance(v, unicode):
dutf[k] = v.encode('utf-8')
elif isinstance(v, list):
dutf[k] = array_to_utf(v)
elif isinstance(v, dict):
dutf[k] = dict_to_utf(v)
else:
dutf[k] = v
return dutf
test = {1: u'1', 2: '2', 3: {'x': u'x', 'y': 'y'}, 4: [u'ara', 's', 123], 5: 123}
print(dict_to_utf(a))
# {1: '1', 2: '2', 3: {'y': 'y', 'x': 'x'}, 4: ['ara', 's', 123], 5: 123}
这两种方法本身都是递归的,并且彼此之间也是递归的。您需要编码字符串,而不是列表或字典。python的版本是什么?@PadraicConningham 2.7.6。您希望得到什么样的输出?这种csv输出的特定格式您会怎么做/推荐?或者有没有办法让它在你看来更具可读性/形象?@spicyclubauce,为什么列表中有一些值?没有,只是想知道。那么,您认为原始代码中的根本问题是什么?当我刚刚将dict转换成字符串时,
def utf_8_编码器(unicode_csv_数据):用于unicode_csv_数据中的行:返回“:”.join(“{}{}{}.”格式(key,val)用于key,val用于line.items())
它也不起作用,说:“'ascii'编解码器无法对70位的字符u'\xa0'进行编码:序号不在范围内(128)'我们不也应该迭代writer.writerows()的dict中的每个项吗?@SpicyClubSauce,问题中的函数的问题是您在第一行之后返回,因此您只编码一个键。/string,实际上根本不更改dict中的实际对象,因此您仍然按原样传递数据dict