Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/14.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何从文件中的每一行识别多个单词和相应的值,例如:“status”:“ok”_Python_Json_String_List_Data Cleaning - Fatal编程技术网

Python 如何从文件中的每一行识别多个单词和相应的值,例如:“status”:“ok”

Python 如何从文件中的每一行识别多个单词和相应的值,例如:“status”:“ok”,python,json,string,list,data-cleaning,Python,Json,String,List,Data Cleaning,我正在尝试创建一个脚本,它基本上允许我创建一个列表,其中包含可以插入SQL DB的行中的特定项。我在文本文件addresses.txt中有多行,如下所示: {"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018","municipalityCode":"0766","municipalityName":"Hed

我正在尝试创建一个脚本,它基本上允许我创建一个列表,其中包含可以插入SQL DB的行中的特定项。我在文本文件addresses.txt中有多行,如下所示:

{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018","municipalityCode":"0766","municipalityName":"Hedensted","streetCode":"0072","streetName":"Værnegården","streetBuildingIdentifier":"13","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"8000","districtName":"Århus","presentationString":"Værnegården 13, 8000 Århus","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(553564 6179299)","x":553564,"y":6179299}]}
例如,我想删除

"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018"
最后是一个列列表和一个值列表,可以写入一个文件_output.txt,如:

INSERT INTO ADDRESSES (%s) VALUES (%s)
这就是我目前所拥有的

# Writes %s into the file output_data.txt
address_line = """INSERT INTO ADDRESSES (%s) VALUES (%s)"""

# Reads every line from the file messy_data.txt
messy_string = file("addresses.txt").readlines()

cols = messy_string[0].split(",")  #Defines each word in the first line separated by , as a column name
colstr = ','.join(cols) # formatted string that will plug in nicely
output_data = file("output_data.txt", 'w') # Creates the output file: output_data.txt
for r in messy_string[0:]: # loop through everything after first line
    #r = r.replace(':',',')
    #temp_replace = r.translate(None,'"{}[]()')
    #address_list = temp_replace.split(",")
    #address_list = [x.encode('utf-8') for x in address_list]
    vals = r.split(",") # split at ,
    valstr = ','.join(vals) # join with commas for sql
    output_data.write(address_line % (colstr, valstr))  # write to file

output_data.close()
如果包括我的一些评论外的尝试,也许它可以帮助。我还注意到,当我使用address_list=temp_replace.split时,我所有的utf-8字符都是螺旋形的,我不知道为什么或者如何更正

更新 看看这个例子 我想出了以下代码来解决我的问题:

# Reads every line from the file coordinates.txt
messy_string = file("coordinates.txt").readlines()

# Reads with the json module
x = json.loads(messy_string

x = json.loads(x)
f = csv.writer(open('test.csv', 'wb+'))

for x in x:
f.writerow([x['status'], 
            x['message'], 
            x['data']['type'], 
            x['data']['addressAccessId'],
            x['data']['municipalityCode'],
            x['data']['municipalityName'],
            x['data']['streetCode'],
            x['data']['streetName'],
            x['data']['streetBuildingIdentifier'],
            x['data']['mailDeliverySublocationIdentifier'],
            x['data']['districtSubDivisionIdentifier'],
            x['data']['postCodeIdentifier'],
            x['data']['districtName'],
            x['data']['presentationString'],
            x['data']['addressSpecificCount'],
            x['data']['validCoordinates'],
            x['data']['geometryWkt'],
            x['data']['x'],
            x['data']['y']])
但是,这并不能解决我的问题,现在我得到以下错误

Traceback (most recent call last):
  File "test2.py", line 10, in <module>
    x = json.loads(messy_string)
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer

有人能帮忙吗?提前谢谢。

我觉得每一行都像是有效的JSON。您可以简单地计算JSON并选择想要保留的键,就像使用字典一样

import json

messy_string = file("addresses.txt").readlines()

for line in messy_string:
  try:
    parsed = json.loads(line)
    column_names = parsed.keys()
    column_values = parsed.values()
    print parsed
  except:
    raise 'Could not parse line'

谢谢你,卡提克。我尝试了您的解决方案,但我得到了一个语法错误:当我尝试将列值写入输出文件clean_data时语法无效。writeaddress_line%column_values我对python仍然很陌生,因此非常感谢您的任何细化。column_values是一个列表%s在字符串上工作。请尝试打印%s%,。Join Column_Values感谢您的回答。我上周才开始学习python,我不知道你想让我在哪里打印?我还得到了以下错误:TypeError异常必须是旧式类或派生自BaseException not str。我正在尝试创建一个脚本,该脚本将允许我将json文本转换为csv文本,并使用我选择的列。你能不能再详细一点,也许把各个部分连接起来?先谢谢你。