python中json格式的额外双引号
我有一个没有电影信息的数据集;我想从OMDBapi以json格式向其中添加电影信息 我用python 3.5编写这段代码是为了实现这一点:python中json格式的额外双引号,python,json,unicode,double-quotes,Python,Json,Unicode,Double Quotes,我有一个没有电影信息的数据集;我想从OMDBapi以json格式向其中添加电影信息 我用python 3.5编写这段代码是为了实现这一点: import urllib.request import csv import json import datetime from collections import defaultdict from urllib import response i=0 columns = defaultdict(list) with open('C:\dataset\d
import urllib.request
import csv
import json
import datetime
from collections import defaultdict
from urllib import response
i=0
columns = defaultdict(list)
with open('C:\dataset\dataset.dat') as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
with open('C:\dataset\dataset.dat','r',encoding='utf-8') as csvinput:
with open('C:\dataset\dataset_edited.dat', 'w',encoding='utf-8') as csvoutput:
writer = csv.writer(csvoutput)
for row in csv.reader(csvinput):
if row[0] == "user_id":
writer.writerow(row+["movie_in_json_format"])
else:
movieJson=urllib.request.urlopen("http://www.omdbapi.com/?i=tt"+str(columns['item_id'][i])+"&y=&plot=short&r=json").read()
movieJson=movieJson.decode('utf-8')
writer.writerow(row+[movieJson])
i=i+1
json格式以以下格式写入文件:
"{""Title"":""CitizenDog"",""Year"":""2004"",""Rated"":""N/A"",""Released"":""09 Mar 2006"",""Runtime"":""100 min"",""Genre"":""Comedy, Fantasy, Romance"",""Director"":""Wisit Sasanatieng"",""Writer"":""Koynuch (novel), Wisit Sasanatieng"",""Actors"":""Mahasamut Boonyaruk, Saengthong Gate-Uthong, Sawatwong Palakawong Na Autthaya, Nattha Wattanapaiboon"",""Plot"":""Pod is a man without a dream. He's a country bumpkin who comes to work at a tinned sardine factory in Bangkok. One day, Pod chops off his finger and packs it in the can, prompting him to go..."",""Language"":""Thai, English, Mandarin"",""Country"":""Thailand"",""Awards"":""2 wins & 1 nomination."",""Poster"":""http://ia.media-imdb.com/images/M/MV5BY2VlNDQwZTctMjBlNy00ZjYyLWEwYzAtNjA1YTNjNjVlMjU1XkEyXkFqcGdeQXVyMTIxMDUyOTI@._V1_SX300.jpg"",""Metascore"":""N/A"",""imdbRating"":""7.5"",""imdbVotes"":""1,544"",""imdbID"":""tt0444778"",""Type"":""movie"",""Response"":""True""}"
虽然应该是这样的:
{"Title":"Citizen Dog","Year":"2004","Rated":"N/A","Released":"09 Mar 2006","Runtime":"100 min","Genre":"Comedy, Fantasy, Romance","Director":"Wisit Sasanatieng","Writer":"Koynuch (novel), Wisit Sasanatieng","Actors":"Mahasamut Boonyaruk, Saengthong Gate-Uthong, Sawatwong Palakawong Na Autthaya, Nattha Wattanapaiboon","Plot":"Pod is a man without a dream. He's a country bumpkin who comes to work at a tinned sardine factory in Bangkok. One day, Pod chops off his finger and packs it in the can, prompting him to go...","Language":"Thai, English, Mandarin","Country":"Thailand","Awards":"2 wins & 1 nomination.","Poster":"http://ia.media-imdb.com/images/M/MV5BY2VlNDQwZTctMjBlNy00ZjYyLWEwYzAtNjA1YTNjNjVlMjU1XkEyXkFqcGdeQXVyMTIxMDUyOTI@._V1_SX300.jpg","Metascore":"N/A","imdbRating":"7.5","imdbVotes":"1,544","imdbID":"tt0444778","Type":"movie","Response":"True"}
如何以正确的格式在文件中写入此json
~请注意,由于此错误,“encoding='utf-8'”已添加到文件i/o中:
'charmap' codec can't encode character '\xf3' in position 3152: character maps to <undefined>
“charmap”编解码器无法对3152位置的字符“\xf3”进行编码:字符映射到
如果其他方法无效,则强制去除多余的引号:
writer.writerow([field.strip('"') for field in row+[movieJson]])
使用此代码解决的问题:
import urllib.request
import csv
import datetime
from collections import defaultdict
from urllib import response
i=0
columns = defaultdict(list)
with open('C:\dataset\dataset.dat',encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
with open('C:\dataset\dataset.dat','r',encoding='utf-8') as csvinput:
f_writ = open('C:\dataset\dataset_edited.csv', 'w',encoding='utf-8')
csvReader = csv.reader(csvinput)
writer = csv.writer(f_writ, delimiter=',',
lineterminator='\r\n',
quotechar = "'"
)
for row in csvReader:
if row[0] == "user_id":
writer.writerow(row+["movie_in_json_format"])
else:
moviejson=urllib.request.urlopen("http://www.omdbapi.com/?i=tt"+str(columns['item_id'][i])+"&y=&plot=short&r=json").read()
moviejson=moviejson.decode('utf-8')
writer.writerow(row+[moviejson])
i=i+1
我猜您使用的特定CSV方言需要用两个引号转义引号。想想看,CSV解析器如何读取生成的CSV文件?@roeland我不知道:(尝试使用CSV解析器模块再次读取该文件,您应该可以取回原始字符串。作为替代方案,您可以将数据文件完全作为JSON文件编写,而不是将JSON包装到CSV中,这将使以后的解析更加简单。多亏了这一点