Python从CSV中提取唯一值
我使用以下python脚本从CSV文件中删除重复项Python从CSV中提取唯一值,python,Python,我使用以下python脚本从CSV文件中删除重复项 with open('test.csv','r') as in_file, open('final.csv','w') as out_file: seen = set() # set for fast O(1) amortized lookup for line in in_file: if line in seen: continue # skip duplicate seen.add(lin
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
if line in seen: continue # skip duplicate
seen.add(line)
out_file.write(line)
我正在尝试修改它,以便不将没有重复项的列表输出到final.csv,而是输出找到的唯一值
和现在的情况正好相反。有人举个例子吗?使用dict记录每行出现的次数,然后您可以处理dict,只将唯一的项目添加到所看到的
集合中,并将其写入final.csv
:
from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
uniques[line] +=1
for k, v in uniques.iteritems():
if v = 1:
seen.add(k)
out_file.write(k)
或:
或者,使用计数器
:
from collections import Counter
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
lines = Counter(file.readlines())
seen = set(k for k in lines if lines[k] == 1)
for itm in seen:
out_file.write(itm)
这将只输出一次出现的行,这取决于您所说的“unique”,这可能是正确的,也可能不是正确的。相反,如果要使用最后一种方法输出所有行,但每行仅输出一个实例:
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
lines = Counter(file.readlines())
for itm in lines:
out_file.write(itm)
您可以将数据副本收集到另一个变量中,并使用这些数据从集合中删除不唯一的值。您的问题没有任何意义。“没有重复项的列表”和“找到的唯一值”是一样的。唯一值是指只找到一个值的实例吗?不,它们不是,他指的是没有重复项的列表,即它们在原始值中只存在一次file@Tom这不是我读那句话的方式。至少,它是非常不清楚和不精确的,这与“它现在所做的事情”并不“相反”。@jpmc26 Tom根据OP分配给他们的问题的标题,对OP real problem的猜测是合理的。因为不使用defaultdict
a在这里是合适的。不需要手动管理计数字典。@GBOFI修改为使用defaultdict。我不喜欢defaultdict
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
lines = Counter(file.readlines())
for itm in lines:
out_file.write(itm)