Sorting 在CSV文件中，Python编码器如何删除跨行的所有重复项（X个重复项除外）？_Sorting_Csv_Python 3.x

Sorting 在CSV文件中，Python编码器如何删除跨行的所有重复项（X个重复项除外）？

sorting csv python-3.x

Sorting 在CSV文件中，Python编码器如何删除跨行的所有重复项（X个重复项除外）？,sorting,csv,python-3.x,Sorting,Csv,Python 3.x,以下是解决此问题的CSV文件示例： Jack,6 Sam,10 Milo,9 Jacqueline,7 Sam,5 Sam,8 Sam,10 让我们把上下文作为这些人参加的测验的名字和分数。我们可以看到Sam已经参加了4次测试，但我只想得到同一个人的结果的X号（他们还需要是最新的条目）。假设我们想要的结果不超过同一个人的3个我意识到，如果没有一些额外的信息，可能不可能实现每个人的结果不超过3个。以下是更新的CSV文件： Jack,6,1793 Sam,10,2079 Milo,9,2132

以下是解决此问题的CSV文件示例：

Jack,6
Sam,10
Milo,9
Jacqueline,7
Sam,5
Sam,8
Sam,10

让我们把上下文作为这些人参加的测验的名字和分数。我们可以看到Sam已经参加了4次测试，但我只想得到同一个人的结果的X号（他们还需要是最新的条目）。假设我们想要的结果不超过同一个人的3个

我意识到，如果没有一些额外的信息，可能不可能实现每个人的结果不超过3个。以下是更新的CSV文件：

Jack,6,1793
Sam,10,2079
Milo,9,2132
Jacqueline,7,2590
Sam,5,2881
Sam,8,3001
Sam,10,3013

第三列基本上是“历元”的秒数，它是时间的参考点。有了这个，我想我可以简单地按照历元列从低到高的顺序对文件进行排序，并使用set（）删除名称列中除了一定数量的重复项之外的所有项，同时也删除删除了被删除的人员分数

理论上，这应该给我每人留下3个最新的结果，但实际上，我不知道如何调整set（）函数来实现这一点，除非有其他方法。所以我的问题是，有什么可能的方法来实现这一点呢？

你可以使用一个

列表，每次你添加一个条目时，检查列表的长度：如果超过三个条目pop
第一个条目（或者在文件中循环后进行检查）。这假定文件是按时间顺序排列的
from collections import defaultdict

# looping over a csv file gives one row at a time
# so we will emulate that
raw_data = [
    ('Jack', '6'),
    ('Sam', '10'),
    ('Milo', '9'),
    ('Jacqueline', '7'),
    ('Sam', '5'),
    ('Sam', '8'),
    ('Sam', '10'),
    ]

# this will hold our information, and works by providing an empty
# list for any missing key
student_data = defaultdict(list)
for row in raw_data:  # note 1
    # separate the row into its component items, and convert
    # score from str to int
    name, score = row
    score = int(score)
    # get the current list for the student, or a brand-new list
    student = student_data[name]
    student.append(score)
    # after addeng the score to the end, remove the first scores
    # until we have no more than three items in the list
    if len(student) > 3:
        student.pop(0)

# print the items for debugging
for item in student_data.items():
    print(item)

其结果是：
('Milo', [9])
('Jack', [6])
('Sam', [5, 8, 10])
('Jacqueline', [7])


注1：要使用实际的csv文件，您需要以下代码：
raw_file = open('some_file.csv')
csv_file = csv.reader(raw_file)
for row in csv_file:
    ...

您可以使用列表
，每次添加条目时，检查列表的长度：如果超过三个条目，请弹出第一个条目（或在文件中循环后进行检查）。这假定文件是按时间顺序排列的
from collections import defaultdict

# looping over a csv file gives one row at a time
# so we will emulate that
raw_data = [
    ('Jack', '6'),
    ('Sam', '10'),
    ('Milo', '9'),
    ('Jacqueline', '7'),
    ('Sam', '5'),
    ('Sam', '8'),
    ('Sam', '10'),
    ]

# this will hold our information, and works by providing an empty
# list for any missing key
student_data = defaultdict(list)
for row in raw_data:  # note 1
    # separate the row into its component items, and convert
    # score from str to int
    name, score = row
    score = int(score)
    # get the current list for the student, or a brand-new list
    student = student_data[name]
    student.append(score)
    # after addeng the score to the end, remove the first scores
    # until we have no more than three items in the list
    if len(student) > 3:
        student.pop(0)

# print the items for debugging
for item in student_data.items():
    print(item)

其结果是：
('Milo', [9])
('Jack', [6])
('Sam', [5, 8, 10])
('Jacqueline', [7])


注1：要使用实际的csv文件，您需要以下代码：
raw_file = open('some_file.csv')
csv_file = csv.reader(raw_file)
for row in csv_file:
    ...

要处理时间戳，您可以使用：
要处理时间戳，您可以使用：
谢谢你的回答，但我不太明白如果“原始数据”是CSV文件，这将如何工作。如果您能解释一下您的代码是如何工作的，我将不胜感激，也许请在代码中添加注释？@Roughbladez：更新的答案，不客气。：）谢谢你的回答，但我不太明白如果“原始数据”是CSV文件，这将如何工作。如果您能解释一下您的代码是如何工作的，我将不胜感激，也许请在代码中添加注释？@Roughbladez：更新的答案，不客气。：）