将文本元素保存到python中的词典
我有一个以下格式的文本文件:将文本元素保存到python中的词典,python,python-3.x,list,dictionary,Python,Python 3.x,List,Dictionary,我有一个以下格式的文本文件: attr1 1,3,7,6,8,12,24,56 attr2 1,2,3 attr4 56,45,48,23,24,25,29,90,56,57,58,59 attr5 1,2,3,45,6,7,8,9,34,33 我想创建一个dict,其中数字是键,如果包含attr,那么每个键都必须包含在一个列表中。为了更具体地说明我所写的例子,dict必须是: 1: [attr1,attr2,attr5] 2: [attr2,attr5
attr1 1,3,7,6,8,12,24,56
attr2 1,2,3
attr4 56,45,48,23,24,25,29,90,56,57,58,59
attr5 1,2,3,45,6,7,8,9,34,33
我想创建一个dict,其中数字是键,如果包含attr,那么每个键都必须包含在一个列表中。为了更具体地说明我所写的例子,dict必须是:
1: [attr1,attr2,attr5]
2: [attr2,attr5]
3: [attr1,attr2,attr5]
6: [attr1, attr5]
etc...
我试图实现它,并编写了以下代码,但它不起作用。这是我的密码:
file2 = open("attrs.txt","r")
lines2 = file2.readlines()
d = dict()
list1 = []
for x in lines2:
x = x.strip()
x = x.split('\t')
y = x[0]
list1.append(x[1].split(','))
for i in list1:
d[i] = y
您可以使用集合。defaultdict:
import collections
import re
file_data = [[a, list(map(int, b.split(',')))] for a, b in [re.split('\s+', i.strip('\n')) for i in open('filename.txt')]]
d = collections.defaultdict(list)
for a, b in file_data:
for i in b:
d[i].append(a)
print(dict(d))
输出:
{1: ['attr1', 'attr2', 'attr5'], 2: ['attr2', 'attr5'], 3: ['attr1', 'attr2', 'attr5'], 6: ['attr1', 'attr5'], 7: ['attr1', 'attr5'], 8: ['attr1', 'attr5'], 9: ['attr5'], 12: ['attr1'], 23: ['attr4'], 24: ['attr1', 'attr4'], 25: ['attr4'], 90: ['attr4'], 29: ['attr4'], 33: ['attr5'], 34: ['attr5'], 45: ['attr4', 'attr5'], 48: ['attr4'], 56: ['attr1', 'attr4', 'attr4'], 57: ['attr4'], 58: ['attr4'], 59: ['attr4']}
{1: ['attr1', 'attr2', 'attr5'], 2: ['attr2', 'attr5'], 3: ['attr1', 'attr2', 'attr5'], 33: ['attr5'], 6: ['attr1', 'attr5'], 7: ['attr1', 'attr5'], 8: ['attr1', 'attr5'], 9: ['attr5'], 12: ['attr1'], 34: ['attr5'], 45: ['attr4', 'attr5'], 48: ['attr4'], 56: ['attr1', 'attr4', 'attr4'], 90: ['attr4'], 57: ['attr4'], 23: ['attr4'], 24: ['attr1', 'attr4'], 25: ['attr4'], 58: ['attr4'], 59: ['attr4'], 29: ['attr4']}
或者使用itertools.groupby
的更短但更复杂的解决方案:
import itertools
new_data = list(itertools.chain(*[[[i, a] for i in b] for a, b in file_data]))
final_result = {a:[b for _, b in c] for a, c in itertools.groupby(sorted(new_data, key=lambda x:x[0]), key=lambda x:x[0])}
输出:
{1: ['attr1', 'attr2', 'attr5'], 2: ['attr2', 'attr5'], 3: ['attr1', 'attr2', 'attr5'], 6: ['attr1', 'attr5'], 7: ['attr1', 'attr5'], 8: ['attr1', 'attr5'], 9: ['attr5'], 12: ['attr1'], 23: ['attr4'], 24: ['attr1', 'attr4'], 25: ['attr4'], 90: ['attr4'], 29: ['attr4'], 33: ['attr5'], 34: ['attr5'], 45: ['attr4', 'attr5'], 48: ['attr4'], 56: ['attr1', 'attr4', 'attr4'], 57: ['attr4'], 58: ['attr4'], 59: ['attr4']}
{1: ['attr1', 'attr2', 'attr5'], 2: ['attr2', 'attr5'], 3: ['attr1', 'attr2', 'attr5'], 33: ['attr5'], 6: ['attr1', 'attr5'], 7: ['attr1', 'attr5'], 8: ['attr1', 'attr5'], 9: ['attr5'], 12: ['attr1'], 34: ['attr5'], 45: ['attr4', 'attr5'], 48: ['attr4'], 56: ['attr1', 'attr4', 'attr4'], 90: ['attr4'], 57: ['attr4'], 23: ['attr4'], 24: ['attr1', 'attr4'], 25: ['attr4'], 58: ['attr4'], 59: ['attr4'], 29: ['attr4']}
如果您乐于使用第三方库,
pandas
提供了一种方法:
import pandas as pd
from io import StringIO
mystr = StringIO("""
attr1 1,3,7,6,8,12,24,56
attr2 1,2,3
attr4 56,45,48,23,24,25,29,90,56,57,58,59
attr5 1,2,3,45,6,7,8,9,34,33""")
# replace mystr with 'file.csv'
df = pd.read_csv(mystr, delim_whitespace=True, header=None, names=['attrs', 'lists'])
# convert identifier column to int
df['attrs'] = df['attrs'].str[4:].map(int)
# split and convert attrs to int
df['lists'] = [list(map(int, x.split(','))) for x in df['lists']]
d = df.set_index('attrs')['lists'].to_dict()
# {1: [1, 3, 7, 6, 8, 12, 24, 56],
# 2: [1, 2, 3],
# 4: [56, 45, 48, 23, 24, 25, 29, 90, 56, 57, 58, 59],
# 5: [1, 2, 3, 45, 6, 7, 8, 9, 34, 33]}
您可以使用本机Python函数来实现这一点 我为每个字典值使用集合,并最终将它们转换为排序列表,以防您的数据包含重复的条目
keys = set()
d = dict()
f = open("attrs.txt","r")
for line in f:
attr,newkeys = line.strip().split()
newkeys = [int(x) for x in newkeys.split(',')]
for key in newkeys:
if key not in keys:
d[key] = set()
keys.add(key)
d[key].add(attr)
for key in list(d):
d[key]=sorted(list(d[key]))
f.close()
这取决于您的目的以及实际数据格式的复杂程度。对于包含更多列的输入文件,或者在文件读取过程中没有其他需要完成的目标,Pandas(如@jpp所做的)将是更好的方法。使用
defaultdict
和设置
:
#!/usr/bin/env python3
import io
import collections
from pprint import pprint
fdata = """attr1 1,3,7,6,8,12,24,56
attr2 1,2,3
attr4 56,45,48,23,24,25,29,90,56,57,58,59
attr5 1,2,3,45,6,7,8,9,34,33
"""
# with open('attrs.txt') as f:
with io.StringIO(fdata) as f:
d = collections.defaultdict(set)
for line in f:
name, keys = line.strip().split()
for k in keys.split(','):
d[int(k)].add(name)
pprint(d)
您也可以这样做:
#
# 1) parse the input into an array of dictionaries
# having a one attribute list for each key
#
attribKeys = [ dict.fromkeys(ak[1].split(","),[ak[0]]) for ak in [ line.split("\t") for line in line2.split("\n")] ]
#
# 2) merge the dictionaries concatenating lists for each key
#
import functools as fn
d = fn.reduce(lambda d,a: dict(list(d.items()) + [(k,d.get(k,[])+v) for k,v in a.items()]),attribKeys)
# d will contain:
#
# {'1': ['attr1', 'attr2', 'attr5'],
# '2': ['attr2', 'attr5'],
# '3': ['attr1', 'attr2', 'attr5'],
# '6': ['attr1', 'attr5'],
# '7': ['attr1', 'attr5'],
# '8': ['attr1', 'attr5'],
# '9': ['attr5'],
# '12': ['attr1'],
# '23': ['attr4'],
# '24': ['attr1', 'attr4'],
# '25': ['attr4'],
# '29': ['attr4'],
# '33': ['attr5'],
# '34': ['attr5'],
# '45': ['attr4', 'attr5'],
# '48': ['attr4'],
# '56': ['attr1', 'attr4'],
# '57': ['attr4'],
# '58': ['attr4'],
# '59': ['attr4'],
# '90': ['attr4']}
你的问题是列表1…我尝试了你的代码,但我得到了这个错误:
ValueError:太多的值无法解包(预期为2)
@LeeYaan奇怪,你确定你问题中发布的数据正是你文件中数据的格式吗?@LeeYaan奇怪,当我从文件中读取它时,它对我有效。哪个版本会引发错误?collections.defaultdict
解决方案,还是使用itertools
的解决方案?