如何在Perl中找到给定路径下存在于不同目录中的文件_Perl

如何在Perl中找到给定路径下存在于不同目录中的文件

perl

如何在Perl中找到给定路径下存在于不同目录中的文件,perl,Perl,我正在寻找一个方法来查找驻留在给定路径中的几个目录中的文件。换句话说，这些目录中的文件名将相同。我的脚本在查找正确路径以grep文件名进行处理时似乎存在层次结构问题。我有一个固定路径作为输入，脚本需要查看路径并从那里查找文件，但我的脚本似乎停留在2层上并从那里处理，而不是查看层中的最后一个目录（在我的示例中，它在“ln”和“nn”上处理并开始处理子例程）修复输入路径为：- /nfs/disks/version_2.0/ 我想通过子例程进行后期处理的文件将存在于以下几个目录下。基本上，我想检查

我正在寻找一个方法来查找驻留在给定路径中的几个目录中的文件。换句话说，这些目录中的文件名将相同。我的脚本在查找正确路径以grep文件名进行处理时似乎存在层次结构问题。我有一个固定路径作为输入，脚本需要查看路径并从那里查找文件，但我的脚本似乎停留在2层上并从那里处理，而不是查看层中的最后一个目录（在我的示例中，它在“ln”和“nn”上处理并开始处理子例程）

修复输入路径为：-

/nfs/disks/version_2.0/

我想通过子例程进行后期处理的文件将存在于以下几个目录下。基本上，我想检查ln目录下的所有目录temp1、temp2和temp3中是否存在

file1.abc

do。如果nn目录下的temp1、temp2、temp3中存在，则与

file2.abc

相同

我想签入完整路径的文件如下所示：-

/nfs/disks/version_2.0/dir_a/ln/temp1/file1.abc
/nfs/disks/version_2.0/dir_a/ln/temp2/file1.abc
/nfs/disks/version_2.0/dir_a/ln/temp3/file1.abc

/nfs/disks/version_2.0/dir_a/nn/temp1/file2.abc
/nfs/disks/version_2.0/dir_a/nn/temp2/file2.abc
/nfs/disks/version_2.0/dir_a/nn/temp3/file2.abc

我的脚本如下：-

#! /usr/bin/perl -w 
my $dir = '/nfs/fm/disks/version_2.0/' ;
opendir(TEMP, $dir) || die $! ;
foreach my $file (readdir(TEMP)) {
    next if ($file eq "." || $file eq "..") ;
    if (-d "$dir/$file") {
        my $d = "$dir/$file";   
        print "Directory:- $d\n" ;
        &getFile($d);
        &compare($file) ;
    }
}

注意，我将

打印“目录：-$d\n”用于调试目的，它打印了以下内容：-
/nfs/disks/version_2.0/dir_a/
/nfs/disks/version_2.0/dir_b/

所以我知道它进入了错误的路径来处理下面的子程序
有人能帮我指出我脚本中的错误在哪里吗？谢谢
 要清楚：脚本应该在目录中递归，并查找具有特定文件名的文件？在这种情况下，我认为以下代码就是问题所在：
if (-d "$dir/$file") {
    my $d = "$dir/$file";   
    print "Directory:- $d\n" ;
    &getFile($d);
    &compare($file) ;
}

我假设&getFile（$d）
要进入一个目录（即递归步骤）。这很好。但是，看起来，&compare（$file）
是您要在查看的对象不是目录时执行的操作。因此，该代码块应如下所示：
if (-d "$dir/$file") {
    &getFile("$dir/$file");  # the recursive step, for directories inside of this one
} elsif( -f "$dir/$file" ){
    &compare("$dir/$file");  # the action on files inside of the current directory
}

sub myFind {
    my $dir = shift;
    foreach my $file( stat $dir ){
        next if $file -eq "." || $file -eq ".."
        my $obj = "$dir/$file";
        if( -d $obj ){
            myFind( $obj );
        } elsif( -f $obj ){
            doSomethingWithFile( $obj );
        }
    }
}
myFind( "/nfs/fm/disks/version_2.0" );

%file_hash = {
    file1.abc => [
       /nfs/disks/version_2.0/dir_a/ln/temp1
       /nfs/disks/version_2.0/dir_a/ln/temp2
       /nfs/disks/version_2.0/dir_a/ln/temp3
    ],
    file2.abc => [
       /nfs/disks/version_2.0/dir_a/nn/temp1
       /nfs/disks/version_2.0/dir_a/nn/temp2
       /nfs/disks/version_2.0/dir_a/nn/temp3
   ];

一般伪代码应如下所示：
if (-d "$dir/$file") {
    &getFile("$dir/$file");  # the recursive step, for directories inside of this one
} elsif( -f "$dir/$file" ){
    &compare("$dir/$file");  # the action on files inside of the current directory
}

sub myFind {
    my $dir = shift;
    foreach my $file( stat $dir ){
        next if $file -eq "." || $file -eq ".."
        my $obj = "$dir/$file";
        if( -d $obj ){
            myFind( $obj );
        } elsif( -f $obj ){
            doSomethingWithFile( $obj );
        }
    }
}
myFind( "/nfs/fm/disks/version_2.0" );

%file_hash = {
    file1.abc => [
       /nfs/disks/version_2.0/dir_a/ln/temp1
       /nfs/disks/version_2.0/dir_a/ln/temp2
       /nfs/disks/version_2.0/dir_a/ln/temp3
    ],
    file2.abc => [
       /nfs/disks/version_2.0/dir_a/nn/temp1
       /nfs/disks/version_2.0/dir_a/nn/temp2
       /nfs/disks/version_2.0/dir_a/nn/temp3
   ];

作为旁注：这个脚本正在重新发明轮子。您只需要编写一个脚本来处理单个文件。您可以完全从shell执行其余操作：
find /nfs/fm/disks/version_2.0 -type f -name "the-filename-you-want" -exec your_script.pl {} \;

哇，就像重温上世纪90年代！Perl代码已经有所发展，您确实需要学习新的东西。看起来您在3.0或4.0版中学习了Perl。以下是一些要点：

使用使用警告
而不是命令行上的-w
使用严格使用。这将要求您使用my
预先声明变量，这将把变量范围限定到本地块或文件（如果它们不在本地块中）。这有助于捕获许多错误

不要将&
放在子例程名称前面
使用和
、或
、和而不是&
、|
和

了解可以节省大量时间和精力的Perl模块

当有人说检测重复项时，我立刻想到哈希。如果使用基于文件名的哈希，则可以轻松查看是否存在重复文件
当然，散列只能对每个键有一个值。幸运的是，在Perl5.x中，该值可以是对另一个数据结构的引用
因此，我建议您使用一个包含列表引用的散列（用旧的说法是数组）。您可以将文件的每个实例推送到该列表中
使用您的示例，您的数据结构如下所示：
if (-d "$dir/$file") {
    &getFile("$dir/$file");  # the recursive step, for directories inside of this one
} elsif( -f "$dir/$file" ){
    &compare("$dir/$file");  # the action on files inside of the current directory
}

sub myFind {
    my $dir = shift;
    foreach my $file( stat $dir ){
        next if $file -eq "." || $file -eq ".."
        my $obj = "$dir/$file";
        if( -d $obj ){
            myFind( $obj );
        } elsif( -f $obj ){
            doSomethingWithFile( $obj );
        }
    }
}
myFind( "/nfs/fm/disks/version_2.0" );

%file_hash = {
    file1.abc => [
       /nfs/disks/version_2.0/dir_a/ln/temp1
       /nfs/disks/version_2.0/dir_a/ln/temp2
       /nfs/disks/version_2.0/dir_a/ln/temp3
    ],
    file2.abc => [
       /nfs/disks/version_2.0/dir_a/nn/temp1
       /nfs/disks/version_2.0/dir_a/nn/temp2
       /nfs/disks/version_2.0/dir_a/nn/temp3
   ];

下面是一个程序：
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);        #Can use `say` which is like `print "\n"`;

use File::Basename; #imports `dirname` and `basename` commands
use File::Find;             #Implements Unix `find` command.

use constant DIR => "/nfs/disks/version_2.0";

# Find all duplicates
my %file_hash;
find (\&wanted, DIR);

# Print out all the duplicates
foreach my $file_name (sort keys %file_hash) {
    if (scalar (@{$file_hash{$file_name}}) > 1) {
        say qq(Duplicate File: "$file_name");
        foreach my $dir_name (@{$file_hash{$file_name}}) {
            say "    $dir_name";
        }
    }
}

sub wanted {
    return if not -f $_;    

    if (not exists $file_hash{$_}) {
        $file_hash{$_} = [];
    }
    push @{$file_hash{$_}}, $File::Find::dir;
}

以下是关于文件：：查找的几件事：

工作在子例程中进行

$\
是文件名，我可以用它来查看这是文件还是目录
$File:：Find:：Name
是包含路径的文件的全名
$File:：Find:：dir
是目录的名称

如果数组引用不存在，我将使用$file\u hash{$\u}=[]创建它。这是不必要的，但我觉得很舒服，而且可以防止错误。要使用$file\u hash{$\u}
作为数组，我必须取消对它的引用。我在它前面放一个@
，这样它就可以是@$file\u hash{$}
或@{$file\u hash{$}

一旦找到所有文件，我就可以打印出整个结构。我唯一要做的就是检查确保每个数组中有多个成员。如果只有一个成员，则不存在重复的成员

对恩典的回应
您好，大卫·W.，非常感谢您的解释和示例脚本。对不起，也许我不太清楚我的问题陈述。我想我不能在数据结构的路径查找中使用哈希。由于*.abc文件有几百个，而且未确定，因此*.abc文件中的每个文件都有相同的文件名，但实际上每个目录结构中的内容不同
例如，file1.abc位于“/nfs/disks/version_2.0/dir_a/ln/temp1”下的内容与file1.abc位于“/nfs/disks/version_2.0/dir_a/ln/temp2”和“/nfs/disks/version_2.0/dir_a/ln/temp3”下的内容不同。我的目的是grep每个目录结构（temp1、temp2和temp3）中的文件列表*.abc，并将文件名列表与主列表进行比较。你能帮我解释一下如何解决这个问题吗？谢谢格雷斯昨天
我只是在我的示例代码中打印文件，但是您可以打开它们并处理它们，而不是打印文件。毕竟，您现在有了文件名和目录。这又是我节目的核心。这一次，我打开文件并查看内容：
foreach my $file_name (sort keys %file_hash) {
    if (scalar (@{$file_hash{$file_name}}) > 1) {
        #say qq(Duplicate File: "$file_name");
        foreach my $dir_name (@{$file_hash{$file_name}}) {
            #say "    $dir_name";
            open (my $fh, "<", "$dir_name/$file_name")
              or die qq(Can't open file "$dir_name/$file_name" for reading);
            # Process your file here...
            close $fh;
        }
    }
}

如果正在查看文件内容，可以使用来确定文件内容是否匹配。这将文件缩减为16到28个字符的字符串，甚至可以用作哈希键而不是文件名。T