Python 如何删除文件中的重复条目_Python_Perl_File

Python 如何删除文件中的重复条目

python perl file

Python 如何删除文件中的重复条目,python,perl,file,Python,Perl,File,我有一个文件input.txt，其中包含以下行： 1_306500682 2_315577060 3_315161284 22_315577259 22_315576763 2_315578866 2_315579020 3_315163106 1_306500983 2_315579517 3_315162181 1_306502338 2_315578919 1_306500655 2_315579567 3_315161256 3_315161708 因此，我只希望每行中的

我有一个文件input.txt，其中包含以下行：

1_306500682 2_315577060 3_315161284 22_315577259 22_315576763 

2_315578866 2_315579020 3_315163106 1_306500983 

2_315579517 3_315162181 1_306502338 2_315578919 

1_306500655 2_315579567 3_315161256 3_315161708

因此，我只希望每行中的第一个条目在u之前具有重复值。对于上述示例，output.txt应包含：

1_306500682 2_315577060 3_315161284 22_315577259 

2_315578866 3_315163106 1_306500983 

2_315579517 3_315162181 1_306502338 

1_306500655 2_315579567 3_315161256

plz帮助..

Perl从命令行

perl -lane 'my %s;print join " ", grep /^(\d+)_/ && !$s{$1}++, @F' file

输出

1_306500682 2_315577060 3_315161284 22_315577259

2_315578866 3_315163106 1_306500983

2_315579517 3_315162181 1_306502338

1_306500655 2_315579567 3_315161256

您可以使用一个单独的集合来跟踪到目前为止遇到的单词前缀，并将每行中不重复的前缀收集到一个列表中。在以这种方式处理每一行之后，可以很容易地构建一行替换文本，其中只包含找到的非重复条目。注意：这只是inspectorG4dget当前答案的一个稍微高效的版本

with open('input.txt', 'rt') as infile, \
     open('non_repetitive_input.txt', 'wt') as outfile:
    for line in infile:
        values, prefixes = [], set()
        for word, prefix in ((entry, entry.partition('_')[0])
                                for entry in line.split()):
            if prefix not in prefixes:
                values.append(word)
                prefixes.add(prefix)
        outfile.write(' '.join(values) + '\n')

输出文件的内容：

1_306500682 2_315577060 3_315161284 22_315577259 2_315578866 3_315163106 1_306500983 2_315579517 3_315162181 1_306502338 1_306500655 2_315579567 3_315161256

StackOverflow是一个站点，您可以在其中发布有关您遇到的问题的问题，而不是一个期望其他人完成您的工作的需求列表。那么你有没有试着自己解决这个问题，然后遇到了问题？你犯了什么错误？你能展示一些代码吗？是的，这就是你想要做的，这就是它应该看起来的样子

with open('input.txt', 'rt') as infile, \
     open('non_repetitive_input.txt', 'wt') as outfile:
    for line in infile:
        values, prefixes = [], set()
        for word, prefix in ((entry, entry.partition('_')[0])
                                for entry in line.split()):
            if prefix not in prefixes:
                values.append(word)
                prefixes.add(prefix)
        outfile.write(' '.join(values) + '\n')