在一列中查找重复项，返回唯一项，并在python中列出另一列中对应的值_Python_No Duplicates

在一列中查找重复项，返回唯一项，并在python中列出另一列中对应的值

python

在一列中查找重复项，返回唯一项，并在python中列出另一列中对应的值,python,no-duplicates,Python,No Duplicates,我想删除第1列中的重复项，并使用python在第2列中返回与每个唯一项关联的相关值列表输入是 1 2 Jack London 'Son of the Wolf' Jack London 'Chris Farrington' Jack London 'The God of His Fathers' Jack London 'Children of the Frost' William Shakespeare 'Venus and Adonis' William Shakespeare 'Th

我想删除第1列中的重复项，并使用python在第2列中返回与每个唯一项关联的相关值列表

输入是

1 2
Jack London 'Son of the Wolf'
Jack London 'Chris Farrington'
Jack London 'The God of His Fathers'
Jack London 'Children of the Frost'
William Shakespeare  'Venus and Adonis' 
William Shakespeare 'The Rape of Lucrece'
Oscar Wilde 'Ravenna'
Oscar Wilde 'Poems'

而输出应该是

1 2
Jack London 'Son of the Wolf, Chris Farrington, Able Seaman, The God of His Fathers,Children of the Frost'
William Shakespeare 'The Rape of Lucrece,Venus and Adonis' 
Oscar Wilde 'Ravenna,Poems'

其中第二列包含与每个项目相关的值的总和。我在dictionary上尝试了set（）函数

dic={'Jack London': 'Son of the Wolf', 'Jack London': 'Chris Farrington', 'Jack London': 'The God of His Fathers'}
set(dic)

set(['Jack London'])

但它只返回字典的第一个键

dic={'Jack London': 'Son of the Wolf', 'Jack London': 'Chris Farrington', 'Jack London': 'The God of His Fathers'}
set(dic)

set(['Jack London'])

在Python中，字典每个键只能包含一个值。但该值可以是项目的集合：

>>> d = {'Jack London': ['Son of the Wolf', 'Chris Farrington']}
>>> d['Jack London']
['Son of the Wolf', 'Chris Farrington']

要从一系列键值对构造这样的字典，可以执行以下操作：

dct = {}
for author, title in items:
    if author not in dct:
        # Create a new entry for the author
        dct[author] = [title]
    else:
        # Add another item to the existing entry
        dct[author].append(title)

循环体可以变得更简洁，如下所示：

dct = {}
for author, title in items:
    dct.setdefault(author, []).append(title)

在Python中，字典每个键只能包含一个值。但该值可以是项目的集合：

>>> d = {'Jack London': ['Son of the Wolf', 'Chris Farrington']}
>>> d['Jack London']
['Son of the Wolf', 'Chris Farrington']

要从一系列键值对构造这样的字典，可以执行以下操作：

dct = {}
for author, title in items:
    if author not in dct:
        # Create a new entry for the author
        dct[author] = [title]
    else:
        # Add another item to the existing entry
        dct[author].append(title)

循环体可以变得更简洁，如下所示：

dct = {}
for author, title in items:
    dct.setdefault(author, []).append(title)

您应该使用

itertools.groupby

，因为您的列表已排序

rows = [('1', '2'),
        ('Jack London', 'Son of the Wolf'),
        ('Jack London', 'Chris Farrington'),
        ('Jack London', 'The God of His Fathers'),
        ('Jack London', 'Children of the Frost'),
        ('William Shakespeare', 'Venus and Adonis'),
        ('William Shakespeare', 'The Rape of Lucrece'),
        ('Oscar Wilde', 'Ravenna'),
        ('Oscar Wilde', 'Poems')]
# I'm not sure how you get here, but that's where you get

from itertools import groupby
from operator import itemgetter

grouped = groupby(rows, itemgetter(0))
result = {group:', '.join([value[1] for value in values]) for group, values in grouped}

这将为您提供以下结果：

In [1]: pprint(result)
{'1': '2',
 'Jack London': 'Son of the Wolf, Chris Farrington, The God of His Fathers, '
                'Children of the Frost',
 'Oscar Wilde': 'Ravenna, Poems',
 'William Shakespeare': 'Venus and Adonis, The Rape of Lucrece'}

您应该使用

itertools.groupby

，因为您的列表已排序

rows = [('1', '2'),
        ('Jack London', 'Son of the Wolf'),
        ('Jack London', 'Chris Farrington'),
        ('Jack London', 'The God of His Fathers'),
        ('Jack London', 'Children of the Frost'),
        ('William Shakespeare', 'Venus and Adonis'),
        ('William Shakespeare', 'The Rape of Lucrece'),
        ('Oscar Wilde', 'Ravenna'),
        ('Oscar Wilde', 'Poems')]
# I'm not sure how you get here, but that's where you get

from itertools import groupby
from operator import itemgetter

grouped = groupby(rows, itemgetter(0))
result = {group:', '.join([value[1] for value in values]) for group, values in grouped}

这将为您提供以下结果：

In [1]: pprint(result)
{'1': '2',
 'Jack London': 'Son of the Wolf, Chris Farrington, The God of His Fathers, '
                'Children of the Frost',
 'Oscar Wilde': 'Ravenna, Poems',
 'William Shakespeare': 'Venus and Adonis, The Rape of Lucrece'}

你是如何划分列的？@AdamSmith我认为这无关紧要，他不是在问如何解析输入。只编写代码来为你做这件事很有诱惑力，但我不认为你或我会从中学到很多东西。下面是一个我认为很有帮助的例子：你是如何划分列的？@AdamSmith我认为这无关紧要，他不是在问如何解析输入。只编写代码来为你做这件事很有诱惑力，但我认为你和我都不会从中学到很多东西。下面是一个我认为有帮助的示例：我认为以下结果更接近所需的说明：结果={group:[x[1:[0]对于值中的x]对于group，GROUBLED中的值}@JimDennis True。我甚至应该做

data={group:[col[1]表示col-in-values]表示group，values-in-grouped}；result=“{}{}.format（行[0]，”.join（行[1:]）表示数据中的行）

是的，从技术上讲，他说“输出应该是”。。。但我假设他实际上更感兴趣的是结果数据结构，而不是文字输出。我的建议，以及奥古斯拉的答案（我投了赞成票），都是基于对他的问题的解释，而不是对“输出”的字面要求。我认为以下结果更接近于预期的说明：结果={group:[x[1:[0]对于x in values]对于group，value in grouped}@JimDennis True。我甚至应该做

data={group:[col[1]表示col-in-values]表示group，values-in-grouped}；result=“{}{}.format（行[0]，”.join（行[1:]）表示数据中的行）

是的，从技术上讲，他说“输出应该是”。。。但我假设他实际上更感兴趣的是结果数据结构，而不是文字输出。我的建议，以及奥古斯拉的答案（我投了赞成票），都是基于对他的问题的解释，而不是对“输出”的字面要求