Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/algorithm/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python从CSV中提取唯一值_Python - Fatal编程技术网

Python从CSV中提取唯一值

Python从CSV中提取唯一值,python,Python,我使用以下python脚本从CSV文件中删除重复项 with open('test.csv','r') as in_file, open('final.csv','w') as out_file: seen = set() # set for fast O(1) amortized lookup for line in in_file: if line in seen: continue # skip duplicate seen.add(lin

我使用以下python脚本从CSV文件中删除重复项

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        if line in seen: continue # skip duplicate

        seen.add(line)
        out_file.write(line)
我正在尝试修改它,以便不将没有重复项的列表输出到final.csv,而是输出找到的唯一值


和现在的情况正好相反。有人举个例子吗?

使用dict记录每行出现的次数,然后您可以处理dict,只将唯一的项目添加到所看到的
集合中,并将其写入
final.csv

from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        uniques[line] +=1
    for k, v in uniques.iteritems():
        if v = 1:
            seen.add(k)
            out_file.write(k)
或:

或者,使用
计数器

from collections import Counter

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    lines = Counter(file.readlines())
    seen = set(k for k in lines if lines[k] == 1)
    for itm in seen:
        out_file.write(itm)
这将只输出一次出现的行,这取决于您所说的“unique”,这可能是正确的,也可能不是正确的。相反,如果要使用最后一种方法输出所有行,但每行仅输出一个实例:

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:

    lines = Counter(file.readlines())

    for itm in lines:
        out_file.write(itm)

您可以将数据副本收集到另一个变量中,并使用这些数据从集合中删除不唯一的值。

您的问题没有任何意义。“没有重复项的列表”和“找到的唯一值”是一样的。唯一值是指只找到一个值的实例吗?不,它们不是,他指的是没有重复项的列表,即它们在原始值中只存在一次file@Tom这不是我读那句话的方式。至少,它是非常不清楚和不精确的,这与“它现在所做的事情”并不“相反”。@jpmc26 Tom根据OP分配给他们的问题的标题,对OP real problem的猜测是合理的。因为不使用
defaultdict
a在这里是合适的。不需要手动管理计数字典。@GBOFI修改为使用defaultdict。我不喜欢
defaultdict
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:

    lines = Counter(file.readlines())

    for itm in lines:
        out_file.write(itm)