Python列表分组依据

Python列表分组依据,python,list,group-by,Python,List,Group By,我有这样一个python列表: Category Title ProductId Rating 'Electronics, Books, Bundles' Lautner e-Reader Cover 161553 4 'Electronics, Books, Bundles' Lautner stand in e-Reader Cover 161552 3 'Electronics, Books, Bundles' Lautner Chocolate NOOK

我有这样一个python列表:

Category    Title   ProductId   Rating
'Electronics, Books, Bundles'   Lautner e-Reader Cover  161553  4
'Electronics, Books, Bundles'   Lautner stand in e-Reader Cover 161552  3
'Electronics, Books, Bundles'   Lautner Chocolate NOOK Case 594451  5
'Electronics, Books, Bundles'   Oliver e-Reader Cover   161685  1
'Electronics, Books, Covers'    Dessin Leather Cover for Nook Color; Nook Tablet Digital Reader 594033  4.3
'Electronics, Books, Covers'    Emerson Quote e-Reader Cover    161542  2.8
'Electronics, Books, Covers'    Industriell Easel e-Reader Cover    161682  3.7
'Electronics, Books, Covers'    Jonathan Adler Book Reader Cover Hd - Elephant  594548  4.9
'Electronics, Scanners, Covers' Lyra Light Front Cover for NOOK eR  161683  4
'Electronics, Scanners, Covers' Nook Tablet Dessin Cover in Marine  161686  3.8
'Electronics, Scanners, Covers' Nook Tablet Horizontal Stand Cover in Red   594202  4.2
'Electronics, Scanners, Covers' Canvas Bella Library Cover  161554  3
'Electronics, Books, Radios'    Groovy Protective Stand Cover: Custom Designed for 7-inch NOOK HD   594549  3.8
'Electronics, Books, Radios'    Hd Groovy Stand In Blue- Nook   594514  4.1
'Electronics, Books, Radios'    Hutton Envelope in Bark 161560  2.9
'Electronics, Books, Radios'    Italian Leather-Style Chesterton Cover for NOOK Reader  161561  4
在所有这些列表值中,我希望每个类别的前k名。Top 2应给出以下结果:

'Electronics, Books, Bundles'   Lautner Chocolate NOOK Case 594451  5
'Electronics, Books, Bundles'   Lautner e-Reader Cover  161553  4
'Electronics, Books, Covers'    Jonathan Adler Book Reader Cover Hd - Elephant  594548  4.9
'Electronics, Books, Covers'    Dessin Leather Cover for Nook Color; Nook Tablet Digital Reader 594033  4.3
'Electronics, Books, Radios'    Hd Groovy Stand In Blue- Nook   594514  4.1
'Electronics, Books, Radios'    Italian Leather-Style Chesterton Cover for NOOK Reader  161561  4
'Electronics, Scanners, Covers' Nook Tablet Horizontal Stand Cover in Red   594202  4.2
'Electronics, Scanners, Covers' Lyra Light Front Cover for NOOK eR  161683  4
添加我尝试过的内容:

sorted_data = sorted(data, key=operator.itemgetter(1), reverse=True)

k = int(sys.argv[1])
for result in sorted_data[:k]:
    print result

在这里,我将“k”作为命令行参数传递给python文件。

这可能是您需要的:

data = ''''Electronics, Books, Bundles'   Lautner e-Reader Cover  161553  4
'Electronics, Books, Bundles'   Lautner stand in e-Reader Cover 161552  3
'Electronics, Books, Bundles'   Lautner Chocolate NOOK Case 594451  5
'Electronics, Books, Bundles'   Oliver e-Reader Cover   161685  1
'Electronics, Books, Covers'    Dessin Leather Cover for Nook Color; Nook Tablet Digital Reader 594033  4.3
'Electronics, Books, Covers'    Emerson Quote e-Reader Cover    161542  2.8
'Electronics, Books, Covers'    Industriell Easel e-Reader Cover    161682  3.7
'Electronics, Books, Covers'    Jonathan Adler Book Reader Cover Hd - Elephant  594548  4.9
'Electronics, Scanners, Covers' Lyra Light Front Cover for NOOK eR  161683  4
'Electronics, Scanners, Covers' Nook Tablet Dessin Cover in Marine  161686  3.8
'Electronics, Scanners, Covers' Nook Tablet Horizontal Stand Cover in Red   594202  4.2
'Electronics, Scanners, Covers' Canvas Bella Library Cover  161554  3
'Electronics, Books, Radios'    Groovy Protective Stand Cover: Custom Designed for 7-inch NOOK HD   594549  3.8
'Electronics, Books, Radios'    Hd Groovy Stand In Blue- Nook   594514  4.1
'Electronics, Books, Radios'    Hutton Envelope in Bark 161560  2.9
'Electronics, Books, Radios'    Italian Leather-Style Chesterton Cover for NOOK Reader  161561  4'''


groups = [item.split("' ") for item in data.split('\n')]
grouped_data = {}

for group in groups:
    item = [group[1].strip()]
    group = group[0].strip("'")
    if group not in grouped_data:
        grouped_data[group] = item
    else:
        grouped_data[group] += item

def topN(data, n):
    data = [item.split() for item in data]
    data = sorted(data, key=lambda x: float(x[-1]), reverse=True)[:n]
    data = [' '.join(item) for item in data]
    return data

result = {}
for k, v in grouped_data.items():
    result[k] = topN(v, 2)

final_result = [': '.join([group1, item1]) for group1, value1 in result.items() for item1 in value1]

这可能是您需要的:

data = ''''Electronics, Books, Bundles'   Lautner e-Reader Cover  161553  4
'Electronics, Books, Bundles'   Lautner stand in e-Reader Cover 161552  3
'Electronics, Books, Bundles'   Lautner Chocolate NOOK Case 594451  5
'Electronics, Books, Bundles'   Oliver e-Reader Cover   161685  1
'Electronics, Books, Covers'    Dessin Leather Cover for Nook Color; Nook Tablet Digital Reader 594033  4.3
'Electronics, Books, Covers'    Emerson Quote e-Reader Cover    161542  2.8
'Electronics, Books, Covers'    Industriell Easel e-Reader Cover    161682  3.7
'Electronics, Books, Covers'    Jonathan Adler Book Reader Cover Hd - Elephant  594548  4.9
'Electronics, Scanners, Covers' Lyra Light Front Cover for NOOK eR  161683  4
'Electronics, Scanners, Covers' Nook Tablet Dessin Cover in Marine  161686  3.8
'Electronics, Scanners, Covers' Nook Tablet Horizontal Stand Cover in Red   594202  4.2
'Electronics, Scanners, Covers' Canvas Bella Library Cover  161554  3
'Electronics, Books, Radios'    Groovy Protective Stand Cover: Custom Designed for 7-inch NOOK HD   594549  3.8
'Electronics, Books, Radios'    Hd Groovy Stand In Blue- Nook   594514  4.1
'Electronics, Books, Radios'    Hutton Envelope in Bark 161560  2.9
'Electronics, Books, Radios'    Italian Leather-Style Chesterton Cover for NOOK Reader  161561  4'''


groups = [item.split("' ") for item in data.split('\n')]
grouped_data = {}

for group in groups:
    item = [group[1].strip()]
    group = group[0].strip("'")
    if group not in grouped_data:
        grouped_data[group] = item
    else:
        grouped_data[group] += item

def topN(data, n):
    data = [item.split() for item in data]
    data = sorted(data, key=lambda x: float(x[-1]), reverse=True)[:n]
    data = [' '.join(item) for item in data]
    return data

result = {}
for k, v in grouped_data.items():
    result[k] = topN(v, 2)

final_result = [': '.join([group1, item1]) for group1, value1 in result.items() for item1 in value1]

可能不是一个有效但可以理解的解决方案:

# make a dict 
from collections import defaultdict
data_dict = defaultdict(list)
for line in data:
    data_dict[line.split("'")[1]].append(line)


# function working on the dict:
def top_results(data_dict, k):
    results = []
    for key in data_dict.keys():
        results.extend(data_dict[key][:k])
    return results
您需要每个元素的最佳结果,因此首先我们需要确定元素。我们通过在“因为这是最简单的指示符,所以第一个空字符串”处拆分来实现这一点[1:]

separated = [element.split("'")[1:] for element in data]
由于我们对第一个字符串标识的项目感兴趣,字典似乎是一个合适的数据结构

from collections import defaultdict
data_dict = defaultdict(list)
for line in separated:
    data_dict[line[0]].append(line)
现在我们有了一个很好的格式,可以对听写进行排序

对于输入数据\输入密钥: data\u dict[key]。sortkey=lambda key\u string:key\u string.split[-1],reverse=True

从这本词典可以很容易地重现我们的结果:

k = 2
results = []
for key in data_dict.keys():
    results.extend(data_dict[key][:k])
关键是使用合适的数据结构,这里是字典。 以下是简短的解决方案:

# make a dict 
from collections import defaultdict
data_dict = defaultdict(list)
for line in data:
    data_dict[line.split("'")[1]].append(line)


# function working on the dict:
def top_results(data_dict, k):
    results = []
    for key in data_dict.keys():
        results.extend(data_dict[key][:k])
    return results
但是,继续使用字典而不是返回不合适的列表可能更合适

总结如下:

确定一个合适的数据结构,这里有一个dict。 获取您的密钥,“拆分”适用于此 以良好的格式重新组织数据 使用list.Sort对列表进行排序。关键是需要,这里我们只使用str.split[-1]这个词,因为这是您的排名。
可能不是一个有效但可以理解的解决方案:

# make a dict 
from collections import defaultdict
data_dict = defaultdict(list)
for line in data:
    data_dict[line.split("'")[1]].append(line)


# function working on the dict:
def top_results(data_dict, k):
    results = []
    for key in data_dict.keys():
        results.extend(data_dict[key][:k])
    return results
您需要每个元素的最佳结果,因此首先我们需要确定元素。我们通过在“因为这是最简单的指示符,所以第一个空字符串”处拆分来实现这一点[1:]

separated = [element.split("'")[1:] for element in data]
由于我们对第一个字符串标识的项目感兴趣,字典似乎是一个合适的数据结构

from collections import defaultdict
data_dict = defaultdict(list)
for line in separated:
    data_dict[line[0]].append(line)
现在我们有了一个很好的格式,可以对听写进行排序

对于输入数据\输入密钥: data\u dict[key]。sortkey=lambda key\u string:key\u string.split[-1],reverse=True

从这本词典可以很容易地重现我们的结果:

k = 2
results = []
for key in data_dict.keys():
    results.extend(data_dict[key][:k])
关键是使用合适的数据结构,这里是字典。 以下是简短的解决方案:

# make a dict 
from collections import defaultdict
data_dict = defaultdict(list)
for line in data:
    data_dict[line.split("'")[1]].append(line)


# function working on the dict:
def top_results(data_dict, k):
    results = []
    for key in data_dict.keys():
        results.extend(data_dict[key][:k])
    return results
但是,继续使用字典而不是返回不合适的列表可能更合适

总结如下:

确定一个合适的数据结构,这里有一个dict。 获取您的密钥,“拆分”适用于此 以良好的格式重新组织数据 使用list.Sort对列表进行排序。关键是需要,这里我们只使用str.split[-1]这个词,因为这是您的排名。
使用迭代器等,可以获得相对高效的性能。注意:这使用标准Python库

import heapq
import itertools

# group by 'Category'
groups = itertools.groupby(some_list, key=lambda element: element[0])

# take top two of each group based on 'Rating'
top_two_of_each = (heapq.nlargest(2, values, key=lambda value: value[3]) for 
_, values in groups)

# flatten the nested iterators
top_two_of_each_flattened = itertools.chain(*top_two_of_each)

# convert iterator into a list
top_two_of_each_flattened_as_list = list(top_two_of_each_flattened)

使用迭代器等,可以获得相对高效的性能。注意:这使用标准Python库

import heapq
import itertools

# group by 'Category'
groups = itertools.groupby(some_list, key=lambda element: element[0])

# take top two of each group based on 'Rating'
top_two_of_each = (heapq.nlargest(2, values, key=lambda value: value[3]) for 
_, values in groups)

# flatten the nested iterators
top_two_of_each_flattened = itertools.chain(*top_two_of_each)

# convert iterator into a list
top_two_of_each_flattened_as_list = list(top_two_of_each_flattened)

假设你正在寻找类似的东西。

你的名单太长了。这就是为什么我在这里使用了一个简单的列表。

假设您正在寻找类似的产品。

你的名单太长了。这就是为什么我在这里使用了一个简单的列表。

1我在这里看不到python列表。2我认为OP没有做出任何努力。你的数据是什么?首先显示的内容看起来根本不像python列表。是文本文件吗?或者它真的是一个字符串列表,列表[0]=类别标题ProductId Rating,列表[1]=“电子、书籍、捆绑包”Lautner电子阅读器封面161553 4?它是一个列表。为了更好地了解数据,我以表格形式发布了数据。数据[1]=“电子、书籍、捆绑包”,劳特纳电子阅读器封面,161553,41我在这里没有看到python列表。2我认为OP没有做出任何努力。你的数据是什么?首先显示的内容看起来根本不像python列表。是文本文件吗?或者它真的是一个字符串列表,列表[0]=类别标题ProductId Rating,列表[1]=“电子、书籍、捆绑包”Lautner电子阅读器封面161553 4?它是一个列表。为了更好地了解数据,我以表格形式发布了数据。数据[1]=“电子、书籍、捆绑包”,劳特纳电子阅读器封面,161553,4