Python中按组划分的最大值
我有一个python列表。列表列表上的每个值由[类别、类型、项目、分数]表示。对于每个类别和类型,我想返回一个得分最高的项目列表Python中按组划分的最大值,python,list,max,Python,List,Max,我有一个python列表。列表列表上的每个值由[类别、类型、项目、分数]表示。对于每个类别和类型,我想返回一个得分最高的项目列表 [["Edibles", "Fruit", "Apple", 3], "Edibles", "Fruit", "Grapes", 8], "Edible", "Candy", "Hershey", 4], "Edible", "Candy", "Snickers", 6], "NonEdible", "Bikes", "Yamaha", 5], "NonEdible"
[["Edibles", "Fruit", "Apple", 3],
"Edibles", "Fruit", "Grapes", 8],
"Edible", "Candy", "Hershey", 4],
"Edible", "Candy", "Snickers", 6],
"NonEdible", "Bikes", "Yamaha", 5],
"NonEdible", "Bikes", "Suzuki", 7],
"NonEdible", "Cars", "Kia", 8],
"NonEdible", "Cars", "Toyota", 9]]
期望输出
[["Edibles", "Fruit", "Grapes", 8],
"Edible", "Candy", "Snickers", 6],
"NonEdible", "Bikes", "Suzuki", 7],
"NonEdible", "Cars", "Toyota", 9]]
我可以通过创建临时列表的多个循环来实现这一点,但是随着输入大小的增加,计算变得非常缓慢。我正在寻找一个有效的解决方案。您可以使用,但您需要在分组前对列表进行排序:
from itertools import groupby
lst = [["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]]
#if lst is already sorted, skip this step:
lst = sorted(lst, key=lambda k: (k[0], k[1]))
out = [max(g, key=lambda k: k[-1]) for _, g in groupby(lst, lambda k: (k[0], k[1]))]
from pprint import pprint
pprint(out)
印刷品:
[['Edible', 'Candy', 'Snickers', 6],
['Edibles', 'Fruit', 'Grapes', 8],
['NonEdible', 'Bikes', 'Suzuki', 7],
['NonEdible', 'Cars', 'Toyota', 9]]
您可以使用,但需要在分组前对列表进行排序:
from itertools import groupby
lst = [["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]]
#if lst is already sorted, skip this step:
lst = sorted(lst, key=lambda k: (k[0], k[1]))
out = [max(g, key=lambda k: k[-1]) for _, g in groupby(lst, lambda k: (k[0], k[1]))]
from pprint import pprint
pprint(out)
印刷品:
[['Edible', 'Candy', 'Snickers', 6],
['Edibles', 'Fruit', 'Grapes', 8],
['NonEdible', 'Bikes', 'Suzuki', 7],
['NonEdible', 'Cars', 'Toyota', 9]]
使用
使用数据框架可以轻松地操作、分析和可视化数据。
作为pd进口熊猫
设置数据帧
数据=[[食品,水果,苹果,3],
[食品、水果、葡萄,8],
[食用,糖果,好时,4],
[可食用、糖果、零食,6],
[非食用,自行车,雅马哈,5],
[非食用,自行车,铃木,7],
[非食用,汽车,起亚,8],
[非食用,汽车,丰田,9]]
df=pd.DataFramedata
groupby max
输出=df.groupby[0,1].aggmax.reset\u索引
0 1 2 3
0个可食用糖果窃笑者6个
1食用水果葡萄8
2辆非食用自行车雅马哈7
3辆非食用车丰田9
如果需要,输出到列表
output.to_numpy
数组[['可食用','糖果','窃笑',6],
[‘食用’、‘水果’、‘葡萄’,8],
['NonEdible','Bikes','Yamaha',7],
['NonEdible','Cars','Toyota',9]],dtype=object
使用
使用数据框架可以轻松地操作、分析和可视化数据。
作为pd进口熊猫
设置数据帧
数据=[[食品,水果,苹果,3],
[食品、水果、葡萄,8],
[食用,糖果,好时,4],
[可食用、糖果、零食,6],
[非食用,自行车,雅马哈,5],
[非食用,自行车,铃木,7],
[非食用,汽车,起亚,8],
[非食用,汽车,丰田,9]]
df=pd.DataFramedata
groupby max
输出=df.groupby[0,1].aggmax.reset\u索引
0 1 2 3
0个可食用糖果窃笑者6个
1食用水果葡萄8
2辆非食用自行车雅马哈7
3辆非食用车丰田9
如果需要,输出到列表
output.to_numpy
数组[['可食用','糖果','窃笑',6],
[‘食用’、‘水果’、‘葡萄’,8],
['NonEdible','Bikes','Yamaha',7],
['NonEdible','Cars','Toyota',9]],dtype=object
一本简单的字典既快捷又高效
您的列表格式不正确-您没有每个子列表的左括号
你可以用字典一次完成:
input = [["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]
]
highest_val_dict = {}
for curr_list in input:
curr_key = (curr_list[0], curr_list[1]) # (category,type) is the key
curr_item = curr_list[2]
curr_val = curr_list[3]
highest_pair = highest_val_dict.get(curr_key, (None, -1))
if curr_val > highest_pair[1]:
highest_val_dict[curr_key] = (curr_item, curr_val)
>>> for key, val in highest_val_dict.items():
>>> print(f'{key[0]}, {key[1]}, {val[0]}, {val[1]}')
Edibles, Fruit, Grapes, 8
Edible, Candy, Snickers, 6
NonEdible, Bikes, Suzuki, 7
NonEdible, Cars, Toyota, 9
一本简单的字典既快捷又高效
您的列表格式不正确-您没有每个子列表的左括号
你可以用字典一次完成:
input = [["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]
]
highest_val_dict = {}
for curr_list in input:
curr_key = (curr_list[0], curr_list[1]) # (category,type) is the key
curr_item = curr_list[2]
curr_val = curr_list[3]
highest_pair = highest_val_dict.get(curr_key, (None, -1))
if curr_val > highest_pair[1]:
highest_val_dict[curr_key] = (curr_item, curr_val)
>>> for key, val in highest_val_dict.items():
>>> print(f'{key[0]}, {key[1]}, {val[0]}, {val[1]}')
Edibles, Fruit, Grapes, 8
Edible, Candy, Snickers, 6
NonEdible, Bikes, Suzuki, 7
NonEdible, Cars, Toyota, 9
您可以使用pandas库执行以下操作:
安装熊猫,如:
pip install pandas
您的代码是:
In [2271]: import pandas as pd
In [2272]: l = [["Edibles", "Fruit", "Apple", 3],
...: ["Edibles", "Fruit", "Grapes", 8],
...: ["Edible", "Candy", "Hershey", 4],
...: ["Edible", "Candy", "Snickers", 6],
...: ["NonEdible", "Bikes", "Yamaha", 5],
...: ["NonEdible", "Bikes", "Suzuki", 7],
...: ["NonEdible", "Cars", "Kia", 8],
...: ["NonEdible", "Cars", "Toyota", 9]]
In [2275]: df = pd.DataFrame(l, columns=['category','type','item','score'])
In [2284]: df.groupby(['category','type'], as_index=False).agg(max).values.tolist()
Out[2284]:
[['Edible', 'Candy', 'Snickers', 6],
['Edibles', 'Fruit', 'Grapes', 8],
['NonEdible', 'Bikes', 'Yamaha', 7],
['NonEdible', 'Cars', 'Toyota', 9]]
您可以使用pandas库执行以下操作:
安装熊猫,如:
pip install pandas
您的代码是:
In [2271]: import pandas as pd
In [2272]: l = [["Edibles", "Fruit", "Apple", 3],
...: ["Edibles", "Fruit", "Grapes", 8],
...: ["Edible", "Candy", "Hershey", 4],
...: ["Edible", "Candy", "Snickers", 6],
...: ["NonEdible", "Bikes", "Yamaha", 5],
...: ["NonEdible", "Bikes", "Suzuki", 7],
...: ["NonEdible", "Cars", "Kia", 8],
...: ["NonEdible", "Cars", "Toyota", 9]]
In [2275]: df = pd.DataFrame(l, columns=['category','type','item','score'])
In [2284]: df.groupby(['category','type'], as_index=False).agg(max).values.tolist()
Out[2284]:
[['Edible', 'Candy', 'Snickers', 6],
['Edibles', 'Fruit', 'Grapes', 8],
['NonEdible', 'Bikes', 'Yamaha', 7],
['NonEdible', 'Cars', 'Toyota', 9]]
您可以使用常规dict,将每个唯一键的所有值存储在列表中,只需获取最大值:
data = [
["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]]
dct = {}
for item in data:
dct.setdefault((item[0], item[1]), []).append((item[-2], item[-1]))
for k, v in dct.items():
print(list(k) + list(max(v, key=lambda x: x[1])))
输出:
['Edibles', 'Fruit', 'Grapes', 8]
['Edible', 'Candy', 'Snickers', 6]
['NonEdible', 'Bikes', 'Suzuki', 7]
['NonEdible', 'Cars', 'Toyota', 9]
您可以使用常规dict,将每个唯一键的所有值存储在列表中,只需获取最大值:
data = [
["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]]
dct = {}
for item in data:
dct.setdefault((item[0], item[1]), []).append((item[-2], item[-1]))
for k, v in dct.items():
print(list(k) + list(max(v, key=lambda x: x[1])))
输出:
['Edibles', 'Fruit', 'Grapes', 8]
['Edible', 'Candy', 'Snickers', 6]
['NonEdible', 'Bikes', 'Suzuki', 7]
['NonEdible', 'Cars', 'Toyota', 9]
您的多重循环解决方案是什么样子的?根据分数对列表排序,循环并从每种类型中选择一种。时间复杂度将是Onlogn。您的多循环解决方案看起来像什么?根据分数、循环对列表进行排序,并从每种类型中选择一种。时间复杂度仅此而已。仅为此安装pandas有点过头了。它允许您编写最少的代码,在本例中仅需2行。因此,可以选择用最少的行编写干净的代码,而不是编写长的复杂循环。仅为此安装pandas有点过分。它允许您编写最少的代码,在这种情况下,只需2行。因此,可以选择用最少的行编写干净的代码,而不是编写长的复杂循环。