Python 在保留值的同时计数并删除键中的重复项
我整理了一些数据,并将它们编入字典,如下所示:Python 在保留值的同时计数并删除键中的重复项,python,dictionary,Python,Dictionary,我整理了一些数据,并将它们编入字典,如下所示: gen_dict = { "item_C_v001" : "jack", "item_C_v002" : "kris", "item_A_v003" : "john", "item_B_v006" : "peter", "item_A_v005" : "john", "item_A_v004" : "dave" } 我正在尝试以以下格式打印结果: Item Name | No. of Vers. | User ite
gen_dict = {
"item_C_v001" : "jack",
"item_C_v002" : "kris",
"item_A_v003" : "john",
"item_B_v006" : "peter",
"item_A_v005" : "john",
"item_A_v004" : "dave"
}
我正在尝试以以下格式打印结果:
Item Name | No. of Vers. | User
item_A | 3 | dave, john
item_B | 1 | peter
item_C | 2 | jack, kris
在这里,它将相似的版本列表成一行,同时计算有多少个版本,同时说明用户名
我在整合用户名时遇到了问题。我使用了set()
命令,这似乎适用于所有3行输出。
即便如此,虽然我的“项目名称”和“版本号”列似乎是正确的,但有什么方法可以检查找到的版本号是否与该名称相符?如果我有一个小数据,我可以手动计算,但如果我有大数据呢
strip_ver_list = []
user_list = []
for item_name, user in gen_dict.iteritems():
# Strip out the version digits
strip_ver = item_name[:-3]
strip_ver_list.append(strip_ver)
user_list.append(user)
# This will count and remove the duplicates
versions_num = dict((duplicate, strip_ver_list.count(duplicate)) for duplicate in strip_ver_list)
for name, num in sorted(versions_num.iteritems()):
print "Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(name, num, set(user_list))
这是我得到的结果:
Item Name | No. of Vers. | User
item_A | 3 | set(['dave', 'john', 'jack', 'kris', 'peter'])
item_B | 1 | set(['dave', 'john', 'jack', 'kris', 'peter'])
item_C | 2 | set(['dave', 'john', 'jack', 'kris', 'peter'])
这是我能想到的唯一方法。。但是,如果有其他可行的方法来解决这个问题,请与我分享您需要按项目名称对列表进行分组,并从每个组中提取用户,否则用户列表将始终是一个全局用户列表:
我将使用
defaultdict
来聚合数据。大致:
>>> from collections import defaultdict
>>> gen_dict = {
... "item_C_v001" : "jack",
... "item_C_v002" : "kris",
... "item_A_v003" : "john",
... "item_B_v006" : "peter",
... "item_A_v005" : "john",
... "item_A_v004" : "dave"
... }
现在
>>> versions_num = defaultdict(lambda:dict(versions=set(), users = set()))
>>> for item_name, user in gen_dict.items():
... strip_ver = item_name[:-5]
... version_num = item_name[-3:]
... versions_num[strip_ver]['versions'].add(version_num)
... versions_num[strip_ver]['users'].add(user)
...
最后,
>>> for item, data in versions_num.items():
... print("Item {} \tno. of Versions: {}\tUsers:{}".format(item, len(data['versions']), ",".join(data['users'])))
...
Item item_B no. of Versions: 1 Users:peter
Item item_A no. of Versions: 3 Users:john,dave
Item item_C no. of Versions: 2 Users:kris,jack
>>>
如果您希望对其进行排序:
>>> for item, data in sorted(versions_num.items()):
... print("Item {} \tno. of Versions: {}\tUsers:{}".format(item, len(data['versions']), ",".join(data['users'])))
...
Item item_A no. of Versions: 3 Users:john,dave
Item item_B no. of Versions: 1 Users:peter
Item item_C no. of Versions: 2 Users:kris,jack
我会用一个普通的dict来记录用户,用一个普通的dict来记录计数。dict.get()
from collections import defaultdict
gen_dict = {
"item_C_v001" : "jack",
"item_C_v002" : "kris",
"item_A_v003" : "john",
"item_B_v006" : "peter",
"item_A_v005" : "john",
"item_A_v004" : "dave"
}
user_dict = defaultdict(set)
count_dict = {}
for item_name, user in gen_dict.iteritems():
user_dict[item_name[:-3]].add(user) # Sure you want -3 not -5?
count_dict[item_name[:-3]] = count_dict.get(item_name[:-3], 0) + 1
for name, num in sorted(count_dict.iteritems()):
print "Version Name : {0}\nNo. of Versions : {1}\nUsers : {2}".format(
name, num, ', '.join(item for item in user_dict[name]))
IPython中的示例:
In [1]: gen_dict = {
...: "item_C_v001" : "jack",
...: "item_C_v002" : "kris",
...: "item_A_v003" : "john",
...: "item_B_v006" : "peter",
...: "item_A_v005" : "john",
...: "item_A_v004" : "dave"
...: }
去拿钥匙,我们还需要一次
In [2]: keys = tuple(gen_dict.keys())
查找项目集
In [3]: items = set(j[:-5] for j in keys)
表格标题和模板
In [4]: header = 'Item Name | No. of Vers. | User'
In [5]: template = '{:14}|{:<15}|{}'
什么是replicate
?编辑后,我省略了一部分。我想使用defaultdict
意味着我没有必要创建新的dict,并允许我继续“重用”它?@disbidia我不完全确定你的意思,但这听起来很合理。。。这个结构已经推动了我认为笨拙的东西,它可能是值得一个类和封装这一逻辑。我正在做类似的事情,但我正在尝试统计每个用户占用的版本数到每个键。使用此线程的示例,item\u C\u v
共有2个项目,但要将用户列的输出列为jack(1),kris(1)
,这可能吗?我尝试在itertools.groupby(sorted(gen_dict.values())]
中为k,v使用[(k,len(list(v)))]
但这将列出每个用户在不考虑每个键的情况下拥有的版本总数。@yan我恐怕无法在这方面为您提供太多帮助。也许其他人可以帮你:)我不知道整数值可以在get
命令中,并使用它作为“计数器”方法来计数@如果未找到密钥,则get()
方法将返回None
,而不是dict中的KeyError
[
In [4]: header = 'Item Name | No. of Vers. | User'
In [5]: template = '{:14}|{:<15}|{}'
In [6]: print(header)
Item Name | No. of Vers. | User
In [7]: for i in items:
...: relevant = tuple(j for j in keys if j.startswith(i))
...: users = set(gen_dict[x] for x in relevant)
...: print(template.format(i, len(relevant), ' '.join(users)))
...:
item_A |3 |john dave
item_B |1 |peter
item_C |2 |kris jack