用python读取结构化文件
我有一个数据与此类似的文件:用python读取结构化文件,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据与此类似的文件: [START] Name = Peter Sex = Male Age = 34 Income[2020] = 40000 Income[2019] = 38500 [END] [START] Name = Maria Sex = Female Age = 28 Income[2020] = 43000 Income[2019] = 42500 Income[2018] = 40000 [END] [START]
[START]
Name = Peter
Sex = Male
Age = 34
Income[2020] = 40000
Income[2019] = 38500
[END]
[START]
Name = Maria
Sex = Female
Age = 28
Income[2020] = 43000
Income[2019] = 42500
Income[2018] = 40000
[END]
[START]
Name = Jane
Sex = Female
Age = 41
Income[2020] = 60500
Income[2019] = 57500
Income[2018] = 54000
[END]
我想把这些数据读入一个pandas数据框,这样最后它就和这个类似了
Name Sex Age Income[2020] Income[2019] Income[2018]
Peter Male 34 40000 38500 NaN
Maria Female 28 43000 42500 40000
Jane Female 41 60500 57500 54000
到目前为止,我还不能确定这是否是一种标准的数据文件格式(它与JSON有一些相似之处,但仍然非常不同)。
是否有一种优雅而快速的方式将这些数据读取到数据帧 优雅我不知道,但简单的方法,是的。Python非常擅长解析简单的格式化文本 这里,
[START]
开始一条新记录,[END]
结束它,在一条记录中,有key=value
行。您可以轻松构建自定义解析器,以生成记录列表,并将其输入数据帧:
inblock = False
fieldnames = []
data = []
for line in open(filename):
if inblock:
if line.strip() == '[END]':
inblock = False
elif '=' in line:
k, v = (i.strip() for i in line.split('=', 1))
record[k] = v
if not k in fieldnames:
fieldnames.append(k)
else:
if line.strip() == '[START]':
inblock = True
record = {}
data.append(record)
df = pd.DataFrame(data, columns=fieldnames)
df
与预期一致:
Name Sex Age Income[2020] Income[2019] Income[2018]
0 Peter Male 34 40000 38500 NaN
1 Maria Female 28 43000 42500 40000
2 Jane Female 41 60500 57500 54000
我不认为这种格式是一种常见的格式,所以您可能应该像读取普通文件一样读取它,单独解析它,然后将其加载到DF中。