Language agnostic 删除连续、相同、重复的文件

Language agnostic 删除连续、相同、重复的文件,language-agnostic,scripting,file,duplicate-removal,windows-scripting,Language Agnostic,Scripting,File,Duplicate Removal,Windows Scripting,我有一台运行Windows server 2003 R2 Enterprise的服务器,每个服务器的目录在50000到250000个1KB的文本文件之间。文件名是连续的(例如,MLLP000001.rcv、MLLP000002.rcv等),相同的文件将是连续的。一旦后续文件不同,我可以预期不会收到另一个相同的文件 我需要一个脚本,将做以下,但我不知道从哪里开始 for each file in the target directory index 'i' { for each file in

我有一台运行Windows server 2003 R2 Enterprise的服务器,每个服务器的目录在50000到250000个1KB的文本文件之间。文件名是连续的(例如,MLLP000001.rcv、MLLP000002.rcv等),相同的文件将是连续的。一旦后续文件不同,我可以预期不会收到另一个相同的文件

我需要一个脚本,将做以下,但我不知道从哪里开始

for each file in the target directory index 'i'
{
  for each file in the target directory index 'j' = i+1
  {
    compare the hash values of files i and j

    if the hashes are identical
      delete file j
    if the hashes differ
      set i = j // to skip past the files that are now deleted
      break
  }
}

我尝试了DOS批处理脚本,但那真的很麻烦,我无法打破内部循环,它会自动跳转,因为外部循环在目录中有一个文件列表,但是这个列表在不断变化。据我所知,VBScript没有哈希函数。

听起来您可以执行以下操作:

Set Files to an array of files in a given directory.
Set PreviousHash to hash of the first file in the Files.

For each CurrentFile file after the first in Files,
    Set CurrentHash to hash of the CurrentFile.
    If CurrentHash is equal to PreviousHash, then delete CurrentFile.
    Else, set PreviousHash to CurrentHash.

既然文件大小只有1KB,为什么不进行位比较并避免散列呢