Python csv的2个元素列表
我在解析一个可怕的txt文件时遇到了一个问题,我已设法将我需要的信息提取到列表中:Python csv的2个元素列表,python,Python,我在解析一个可怕的txt文件时遇到了一个问题,我已设法将我需要的信息提取到列表中: ['OS-EXT-SRV-ATTR:host', 'compute-0-4.domain.tld'] ['OS-EXT-SRV-ATTR:hostname', 'commvault-vsa-vm'] ['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-4.domain.tld'] ['OS-EXT-SRV-ATTR:instance_name', 'instanc
['OS-EXT-SRV-ATTR:host', 'compute-0-4.domain.tld']
['OS-EXT-SRV-ATTR:hostname', 'commvault-vsa-vm']
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-4.domain.tld']
['OS-EXT-SRV-ATTR:instance_name', 'instance-00000008']
['OS-EXT-SRV-ATTR:root_device_name', '/dev/vda']
['hostId', '985035a85d3c98137796f5799341fb65df21e8893fd988ac91a03124']
['key_name', '-']
['name', 'Commvault_VSA_VM']
['OS-EXT-SRV-ATTR:host', 'compute-0-28.domain.tld']
['OS-EXT-SRV-ATTR:hostname', 'dummy-vm']
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-28.domain.tld']
['OS-EXT-SRV-ATTR:instance_name', 'instance-0000226e']
['OS-EXT-SRV-ATTR:root_device_name', '/dev/hda']
['hostId', '7bd08d963a7c598f274ce8af2fa4f7beb4a66b98689cc7cdc5a6ef22']
['key_name', '-']
['name', 'Dummy_VM']
['OS-EXT-SRV-ATTR:host', 'compute-0-20.domain.tld']
['OS-EXT-SRV-ATTR:hostname', 'mavtel-sif-vsifarvl11']
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-20.domain.tld']
['OS-EXT-SRV-ATTR:instance_name', 'instance-00001da6']
['OS-EXT-SRV-ATTR:root_device_name', '/dev/vda']
['hostId', 'dd82c20a014e05fcfb3d4bcf653c30fa539a8fd4e946760ee1cc6f07']
['key_name', 'mav_tel_key']
['name', 'MAVTEL-SIF-vsifarvl11']
我希望元素0作为标题,1有行,例如:
OS-EXT-SRV-ATTR:host, OS-EXT-SRV-ATTR:hostname,...., name
compute-0-4.domain.tld, commvault-vsa-vm,....., Commvault_VSA_VM
compute-0-28.domain.tld, dummy-vm,...., Dummy_VM
以下是我目前的代码:
import re
with open('metadata.txt', 'r') as infile:
lines = infile.readlines()
for line in lines:
if re.search('hostId|properties|OS-EXT-SRV-ATTR:host|OS-EXT-SRV-ATTR:hypervisor_hostname|name', line):
re.sub("[\t]+", " ", line)
find = line.strip()
format = ''.join(line.split()).replace('|', ',')
list = format.split(',')
new_list = list[1:-1]
我对python非常陌生,所以有时我对如何使事情正常运行的想法已经没有了。看起来像是熊猫的工作:
import pandas as pd
list_to_export = [['OS-EXT-SRV-ATTR:host', 'compute-0-4.domain.tld'],
['OS-EXT-SRV-ATTR:hostname', 'commvault-vsa-vm'],
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-4.domain.tld'],
['OS-EXT-SRV-ATTR:instance_name', 'instance-00000008'],
['OS-EXT-SRV-ATTR:root_device_name', '/dev/vda'],
['hostId', '985035a85d3c98137796f5799341fb65df21e8893fd988ac91a03124'],
['key_name', '-'],
['name', 'Commvault_VSA_VM'],
['OS-EXT-SRV-ATTR:host', 'compute-0-28.domain.tld'],
['OS-EXT-SRV-ATTR:hostname', 'dummy-vm'],
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-28.domain.tld'],
['OS-EXT-SRV-ATTR:instance_name', 'instance-0000226e'],
['OS-EXT-SRV-ATTR:root_device_name', '/dev/hda'],
['hostId', '7bd08d963a7c598f274ce8af2fa4f7beb4a66b98689cc7cdc5a6ef22'],
['key_name', '-'],
['name', 'Dummy_VM'],
['OS-EXT-SRV-ATTR:host', 'compute-0-20.domain.tld'],
['OS-EXT-SRV-ATTR:hostname', 'mavtel-sif-vsifarvl11'],
['OS-EXT-SRV-ATTR:hypervisor_hostname', 'compute-0-20.domain.tld'],
['OS-EXT-SRV-ATTR:instance_name', 'instance-00001da6'],
['OS-EXT-SRV-ATTR:root_device_name', '/dev/vda'],
['hostId', 'dd82c20a014e05fcfb3d4bcf653c30fa539a8fd4e946760ee1cc6f07'],
['key_name', 'mav_tel_key'],
['name', 'MAVTEL-SIF-vsifarvl11']]
data_dict = {}
for i in list_to_export:
if i[0] not in data_dict:
data_dict[i[0]] = [i[1]]
else:
data_dict[i[0]].append(i[1])
pd.DataFrame.from_dict(data_dict, orient = 'index').T.to_csv('filename.csv')
您可以通过跟踪标题和文本文件中的每个条目,逐步构建二维数组
headers = list(set([entry[0] for entry in data])) # obtain unique headers
num_rows = 1
for entry in data: # figuring out how many rows we are going to need
if 'name' in entry: # name is unique per row so using that
num_rows += 1
num_cols = len(headers)
mat = [[0 for _ in range(num_cols)] for _ in range(num_rows)]
mat[0] = headers # add headers as first row
header_lookup = {header: i for i, header in enumerate(headers)}
row = 1
for entry in data:
header, val = entry[0], entry[1]
col = header_lookup[header]
mat[row][col] = val # add entries to each subsequent row
if header == 'name':
row += 1
print mat
输出:
[['hostId', 'OS-EXT-SRV-ATTR:host', 'name', 'OS-EXT-SRV-ATTR:hostname', 'OS-EXT-SRV-ATTR:instance_name', 'OS-EXT-SRV-ATTR:root_device_name', 'OS-EXT-SRV-ATTR:hypervisor_hostname', 'key_name'], ['985035a85d3c98137796f5799341fb65df21e8893fd988ac91a03124', 'compute-0-4.domain.tld', 'Commvault_VSA_VM', 'commvault-vsa-vm', 'instance-00000008', '/dev/vda', 'compute-0-4.domain.tld', '-'], ['7bd08d963a7c598f274ce8af2fa4f7beb4a66b98689cc7cdc5a6ef22', 'compute-0-28.domain.tld', 'Dummy_VM', 'dummy-vm', 'instance-0000226e', '/dev/hda', 'compute-0-28.domain.tld', '-'], ['dd82c20a014e05fcfb3d4bcf653c30fa539a8fd4e946760ee1cc6f07', 'compute-0-20.domain.tld', 'MAVTEL-SIF-vsifarvl11', 'mavtel-sif-vsifarvl11', 'instance-00001da6', '/dev/vda', 'compute-0-20.domain.tld', 'mav_tel_key']]
如果需要将新的2D数组写入文件,以使其不至于“可怕”:
查看您的输入文件,我看到它包含openstack
nova show
命令输出的内容,与其他内容混合在一起。基本上有两种类型的行:有效行和无效行(duh)
有效的具有以下结构:
'| key | value |'
无效的人还有其他的东西
所以我们可以定义每一条有效的线
- 可以在
处拆分为四个部分,其中|
- 第一部分和最后一部分必须为空,其他部分必须填充
ValueError
。当我们现在确保a
和d
为空,并且b
和c
不为空时,我们有一个有效行
此外,我们可以说,如果b
等于'Property'
和c
等于'Value'
,我们已经到达了标题行,下面的内容必须描述“新记录”
此函数的作用正好是:
def parse_metadata_file(path):
""" parses a data file generated by `nova show` into records """
with open(path, 'r', encoding='utf8') as file:
record = {}
for line in file:
try:
# unpack line into 4 fields: "| key | val |"
a, key, val, z = map(str.strip, line.split('|'))
if a != '' or z != '' or key == '' or val == '':
continue
except ValueError:
# skip invalid lines
continue
if key == 'Property' and val == 'Value' and record:
# output current record and start a new one
yield record
record = {}
else:
# write property to current record
record[key] = val
# output last record
if record:
yield record
它为找到的每条记录吐出一个新的dict,并忽略所有未通过健全性检查的行。该函数有效地生成一个dict流
现在,我们可以使用csv
模块将此DICT流写入csv文件:
import csv
# list of fields we are interested in
fields = ['hostId', 'properties', 'OS-EXT-SRV-ATTR:host', 'OS-EXT-SRV-ATTR:hypervisor_hostname', 'name']
with open('output.csv', 'w', encoding='utf8', newline='') as outfile:
writer = csv.DictWriter(outfile, fieldnames=fields, extrasaction='ignore')
writer.writeheader()
writer.writerows(parse_metadata_file('metadata.txt'))
CSV模块有一个DictWriter
,其设计用于接受dicts作为输入,并根据给定的键名将其写入CSV行
- 使用
时,当前记录的字段是否超过要求并不重要extrasaction='ignore'
- 使用
列表,提取一组不同的字段变得非常容易字段
- 配置编写器以满足您的需要()
- 这:
是一个方便的缩写writer.writerows(parse_metadata_file('metadata.txt'))
for record in parse_metadata_file('metadata.txt'): writer.writerow(record)
writer.writerows(parse_metadata_file('metadata.txt'))
for record in parse_metadata_file('metadata.txt'):
writer.writerow(record)