特定子目录中的递归唯一搜索-Perl_Perl_File Io

特定子目录中的递归唯一搜索-Perl

perl file-io

特定子目录中的递归唯一搜索-Perl,perl,file-io,Perl,File Io,这是我的目录结构 Current / | \ a d g / \ / \ |

这是我的目录结构

                                    Current
                   /                    |                       \
           a                            d                       g
        /      \                   /             \              | 
        b       c                e              morning         evenin
       /  \    /   \             |
     hello hi  bad good          f
                                 /  \   
                               good night

其中，a、b、c、d、e、f、g是目录，其他是文件。现在我想在当前文件夹中递归搜索，这样搜索不应该只在当前目录的g文件夹中进行。另外，由于current-a-c-good和current-d-e-f-good中的“good”文件是相同的，因此它的内容只能列出一次。

你能帮我怎么做吗？

Paulchenkiller在评论中的建议很好。

File:：Find

模块以递归方式进行搜索，以便在遍历过程中轻松处理文件和目录。在这里，你有一些类似的东西，你正在寻找。它使用

preprocess

选项修剪目录，使用

wanted

选项获取所有文件名

my $path = "/some/path";
my $filenames = {};

recursive( $path );

print join( "\n", keys %$filenames );

sub recursive
{
    my $p = shift;
    my $d;

    opendir $d, $p;

    while( readdir $d )
    {
        next if /^\./; # this will skip '.' and '..' (but also '.blabla')

        # check it is dir
        if( -d "$p/$_" )
        {
            recursive( "$p/$_" );
        }
        else
        {
            $filenames->{ $_ } = 1;
        }
    }

    closedir $d;
}

#!/usr/bin/env perl

use strict;
use warnings;
use File::Find;

my (%processed_files);

find( { wanted => \&wanted,
        preprocess => \&dir_preprocess,
      }, '.',
);

for ( keys %processed_files ) { 
        printf qq|%s\n|, $_;
}

sub dir_preprocess {
        my (@entries) = @_; 
        if ( $File::Find::dir eq '.' ) { 
                @entries = grep { ! ( -d && $_ eq 'g' ) } @entries;
        }   
        return @entries;
}

sub wanted {
        if ( -f && ! -l && ! defined $processed_files{ $_ } ) { 
                $processed_files{ $_ } = 1;
        }   
}

看看

use File:：Find

我没有得到“next if”部分。它在哪里检查同名文件应该只出现一次？之后有一条评论。你至少知道一点Perl吗？我问这个问题的时候它不在那里！至于问题的第二部分，确定两个文件是否相同，这有点有趣。我通常使用

Digest:：SHA

为每个文件构建一个唯一的字符串，将其保存在散列中，并使用该字符串确定我以前是否见过该文件。这很费劲，因为它需要读取每个文件两次，但它的内容是准确的。这在一定程度上取决于您所说的文件在不同目录中“相同”的确切含义。@StuartWatt:我假设两个同名的文件是iddential。但这并不是那么费力，与其将文件名（

$\u

）保存为

%已处理的\u文件

散列的键，不如计算其

SHA

，将其保存为键，将文件名保存为值。没有必要读两遍，或者我错过了什么？