Python 在特定列中选择具有唯一值的csv行
我有一个CSV文件,其中包含以下行:Python 在特定列中选择具有唯一值的csv行,python,python-3.x,csv,Python,Python 3.x,Csv,我有一个CSV文件,其中包含以下行: A,apple,102 A,orange,103 B,banana,101 C,peach,102 B,orange,104 等等 我想删除第一列中具有重复值的行,上面的输出应该是: A,apple,102 B,banana,101 C,peach,102 您可以创建一个空集,并将第一列的值添加到其中。如果它已经在集合中,只需跳到下一行,例如: import csv column_values = set() new_rows = [] with o
A,apple,102
A,orange,103
B,banana,101
C,peach,102
B,orange,104
等等
我想删除第一列中具有重复值的行,上面的输出应该是:
A,apple,102
B,banana,101
C,peach,102
您可以创建一个空集,并将第一列的值添加到其中。如果它已经在集合中,只需跳到下一行,例如:
import csv
column_values = set()
new_rows = []
with open('example.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if (row[0] in column_values):
continue
column_values.add(row[0])
new_rows.append(row)
with open('updated.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(new_rows)
在中,有一个独特的配方(此处稍作修改)。在这里可能有点过分,但它是有效的:
from io import StringIO
from csv import reader
from operator import itemgetter
def unique_everseen(iterable, key):
"List unique elements, preserving order. Remember all elements ever seen."
seen = set()
seen_add = seen.add
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
txt = '''A,apple,102
A,orange,103
B,banana,101
C,peach,102
B,orange,104'''
with StringIO(txt) as file:
rows = reader(file)
unique_rows = unique_everseen(rows, key=itemgetter(0))
for row in unique_rows:
print(row)
我使用as键
来选择行中的第一列
然后可以使用csv.writer
将行
写入新文件
当然,您必须将
StringIO(txt)
替换为open('file.csv','r')
如果您愿意使用第三方库,您可以使用熊猫:
import pandas as pd
from io import StringIO
x = StringIO("""A,apple,102
A,orange,103
B,banana,101
C,peach,102
B,orange,104""")
# read file and drop duplicates, replace x with 'file.csv'
df = pd.read_csv(x, names=['letter', 'fruit', 'value'])\
.drop_duplicates('fruit', keep=False)
# export to output csv
df.to_csv('file_out.csv', index=False, header=False)
print(df)
letter fruit value
0 A apple 102
2 B banana 101
3 C peach 102