如何使用python对目录中的文件进行模拟排序和唯一性？_Python_Python 3.x

如何使用python对目录中的文件进行模拟排序和唯一性？

python python-3.x

如何使用python对目录中的文件进行模拟排序和唯一性？,python,python-3.x,Python,Python 3.x,我正在尝试在一个文件中对30个不同大小的文件进行排序和唯一化。每个文件包含一行，并用换行符分隔。这意味着文件的每一行都有简单的文本。以下是我试图尝试的： lines_seen = set() # holds lines already seen outfile = open('out.txt', "w") for line in open('d:\\testing\\*', "r"): if line not in lines_seen: # not a duplicate

我正在尝试在一个文件中对30个不同大小的文件进行排序和唯一化。
每个文件包含一行，并用换行符分隔。这意味着文件的每一行都有简单的文本。
以下是我试图尝试的：

lines_seen = set() # holds lines already seen
outfile = open('out.txt', "w")
for line in open('d:\\testing\\*', "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)
outfile.close()

文件夹名为

testing

，它包含30个不同的文件，我正试图将这些文件合并到文件

out.txt

。输出将是排序后的唯一文本，写入输出文件的每一行。
嗯，我想如果我写

d:\\testing\\*

，它会从文件夹中读取文件，这会很容易。但我有一个错误：

Traceback (most recent call last):
  File "sort and unique.py", line 3, in <module>
    for line in open('d:\\testing\\*', "r"):
OSError: [Errno 22] Invalid argument: 'd:\\testing\\*'

回溯（最近一次呼叫最后一次）：
文件“sort and unique.py”，第3行，在
对于处于开放状态的线路（“d:\\测试\\*”，“r”）：
OSError:[Errno 22]无效参数：“d:\\testing\\\*”

我想知道如何消除此错误，并将所有文件高效地处理为一个输出，而不会出现任何失败。

请注意：RAM为8 GB，文件夹大小约为10 GB。

您只需使用

os.listdir

循环所有文件即可。大概是这样的：

lines_seen = set() # holds lines already seen
outfile = open('out.txt', "w")
path = r'd:\testing'
for file in os.listdir(path): #added this line
    current_file = os.path.join(path, file)
    for line in open(current_file, "r"):
        if line not in lines_seen: # not a duplicate
            outfile.write(line)
            lines_seen.add(line)
outfile.close()

您只需要使用

os.listdir

遍历所有文件。大概是这样的：

lines_seen = set() # holds lines already seen
outfile = open('out.txt', "w")
path = r'd:\testing'
for file in os.listdir(path): #added this line
    current_file = os.path.join(path, file)
    for line in open(current_file, "r"):
        if line not in lines_seen: # not a duplicate
            outfile.write(line)
            lines_seen.add(line)
outfile.close()

让我试试。。。我会给你回复的。有没有可能使程序多进程或多线程，这样会更快？让我试试。。。我会给你回电的。有没有可能使程序多进程或多线程，这样会更快？星型语法需要shell扩展。

glob

模块可以为您实现这一点。星形语法需要shell扩展。

glob

模块可以为您执行此操作。