如何使用python删除文件中的重复行？_Python_Python 2.7

如何使用python删除文件中的重复行？

python python-2.7

如何使用python删除文件中的重复行？,python,python-2.7,Python,Python 2.7,是否可以从读取模式文件中删除重复的行？不更改文件名请任何人帮助我如果文件以读取模式打开，无法追加或编辑，请以写入模式重新打开文件。请参阅之前的帖子：我以前也做过类似的事情。下面的代码可用于搜索具有新字符串的旧字符串的所有实例（在给定路径中的任何文件中）。这可以用于搜索重复字符串，而不是搜索发送的特定文本以下是现有程序的几个示例用法（来自命令行的示例调用）：示例1-如果当前工作目录中有一个名为Example.txt的文件： python replace.py "old blerb" "n

是否可以从读取模式文件中删除重复的行？不更改文件名

请任何人帮助我

如果文件以读取模式打开，无法追加或编辑，请以写入模式重新打开文件。请参阅之前的帖子：

我以前也做过类似的事情。下面的代码可用于搜索具有新字符串的旧字符串的所有实例（在给定路径中的任何文件中）。这可以用于搜索重复字符串，而不是搜索发送的特定文本

以下是现有程序的几个示例用法（来自命令行的示例调用）：

示例1-如果当前工作目录中有一个名为Example.txt的文件：

python replace.py "old blerb" "new blerb" example.txt

python replace.py "old syntax" "new syntax" "%userprofile%\documents\*.sql"

示例2-如果要匹配其他目录中的任何.sql文件（本例中为当前用户文档目录）：

python replace.py "old blerb" "new blerb" example.txt

python replace.py "old syntax" "new syntax" "%userprofile%\documents\*.sql"

替换.py代码

#!/usr/bin/python

from tempfile import mkstemp
from shutil import move
import sys, os, fnmatch

# globals
debug_on = True

def replace(old_str, new_str, file_path):
    with open(file_path, "r+") as f:
        buf = f.readlines()
        f.seek(0)
        cnt = 0
        new = []
        for line in buf:
            if old_str in line:
                l = line.replace(old_str, new_str)
                cnt += 1
            else:
                l = line
            new.append(l)
        if cnt == 0:
            if debug_on:
                print "  no matches found in this file"
        else:
            f.truncate()
            for line in new:
                f.write(line)
            if debug_on:
                print "  "+str(cnt)+" matches replaced"
        f.close()

def get_files(f_str):
    d, ptrn = os.path.split(f_str)
    files = []
    for f in os.listdir(d):
        fx = os.path.split(f)[1]
        if fnmatch.fnmatch(fx, ptrn):
            if '\\' not in f and '/' not in f:
                new_file = os.path.join(d,f)
            else:
                new_file = f
            files.append(new_file)

    if len(files) == 0:
        print "No files found in this directory matching the pattern:", ptrn
        sys.exit()

    return files

def main():
    # check cmd line args provided...
    args = len(sys.argv) -1
    if args <> 3:
        print "\nUsage: python replace.py <old_str> <new_str> <file_path|pattern>\n"
        print "The file path will assume the current working directory if none " \
              "is given."
        print "Search is case-sensitive\n"
        print "Example 1 - if a file named example.txt is in your cwd:\n" \
              'python replace.py "old blerb" "new blerb" example.txt\n'
        print "Example 2 - if you wanted to match any .sql files in another directory:\n" \
              'python replace.py "old syntax" "new syntax" "%userprofile%\documents\*.sql"'
        raw_input("\n...press any key to exit")
        sys.exit()

    # identify files to be evaluated...
    f_str = sys.argv[3]
    if debug_on:
        print "f_str set to:", f_str

    # append path if required
    if '\\' not in f_str and '/' not in f_str:
        f_str = os.path.join(os.getcwd(),f_str)
        if debug_on:
            print "f_str adjusted to:", f_str

    # build list of files
    if '*' in f_str:
        files = get_files(f_str)
    else:
        files = [f_str]

    # do replacement for each file...
    for f in files:
        if debug_on:
            print "\nAbout to call replace, args:\n  ", sys.argv[1], sys.argv[2], f
        replace(sys.argv[1], sys.argv[2], f)

if __name__ == '__main__':
    main()

#/usr/bin/python
从tempfile导入mkstemp
从舒蒂尔进口
导入系统、操作系统、fnmatch
#全球的
debug_on=True
def replace（旧文件、新文件、文件路径）：
打开（文件路径“r+”）作为f：
buf=f.读线（）
f、 搜索（0）
cnt=0
新=[]
对于buf中的行：
如果旧的_str在直线上：
l=线路。更换（旧线路，新线路）
cnt+=1
其他：
l=直线
新增。追加（l）
如果cnt==0：
如果调试打开：
打印“此文件中未找到匹配项”
其他：
f、 截断（）
对于新的行：
f、 写（行）
如果调试打开：
打印“+str（cnt）+”匹配项已替换
f、 关闭（）
def get_文件（f_str）：
d、 ptrn=os.path.split（f_str）
文件=[]
对于os.listdir（d）中的f：
fx=os.path.split（f）[1]
如果fnmatch.fnmatch（外汇、ptrn）：
如果“\\”不在f中且“/”不在f中：
new_file=os.path.join（d，f）
其他：
新文件=f
追加（新的_文件）
如果len（文件）==0：
打印“此目录中未找到与模式匹配的文件：”，ptrn
sys.exit（）
返回文件
def main（）：
#检查提供的命令行参数。。。
args=len（系统argv）-1
如果args 3：
打印“\n用法：python replace.py\n”
打印“如果没有，文件路径将采用当前工作目录”\
“是给的。”
打印“搜索区分大小写\n”
打印“示例1-如果cwd中有名为Example.txt的文件：\n”\
'python replace.py“old blerb”“new blerb”example.txt\n'
打印“示例2-如果要匹配其他目录中的任何.sql文件：\n”\
“python replace.py”旧语法“新语法”“%userprofile%\documents\*.sql”
原始输入（“\n…按任意键退出”）
sys.exit（）
#确定要评估的文件。。。
f_str=sys.argv[3]
如果调试打开：
打印“f_str设置为：”，f_str
#如果需要，追加路径
如果f\U str中没有“\\”和“/”字符：
f_str=os.path.join（os.getcwd（），f_str）
如果调试打开：
打印“f_str调整为：”，f_str
#生成文件列表
如果f_str中的“*”：
files=get_files（f_str）
其他：
文件=[f_str]
#对每个文件进行替换。。。
对于文件中的f：
如果调试打开：
打印“\n关于调用替换，参数：\n”，sys.argv[1]，sys.argv[2]，f
替换（系统argv[1]，系统argv[2]，f）
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
main（）

根据定义，您只能在读取模式下读取文件。您可以在不更改名称的情况下读取和写入文件，但需要相应的模式，除非OP只想“读取”唯一的行。OP，您也要“写入”文件吗？请澄清。使用readlines将所有行放入列表，使用set删除DUP，然后尝试开始文件并写入。如果我使用set，则行的顺序会改变。如果您拥有适当的权限，您可以临时更改文件模式。删除重复的实际代码很容易。只需逐行阅读，使用听写器或set，看看你在写作前是否已经看到了这一行。