使用带有输出重定向的cmp时bash shell脚本无法正常工作_Bash_Output_Stderr_Io Redirection_Cmp

使用带有输出重定向的cmp时bash shell脚本无法正常工作

bash

使用带有输出重定向的cmp时bash shell脚本无法正常工作,bash,output,stderr,io-redirection,cmp,Bash,Output,Stderr,Io Redirection,Cmp,我试图编写一个bash脚本，从文件夹中删除重复文件，只保留一个副本。脚本如下所示： #!/bin/sh for f1 in `find ./ -name "*.txt"` do if test -f $f1 then for f2 in `find ./ -name "*.txt"` do if [ -f $f2 ] && [ "$f1" != "$f2" ] then

我试图编写一个bash脚本，从文件夹中删除重复文件，只保留一个副本。脚本如下所示：

#!/bin/sh

for f1 in `find ./ -name "*.txt"`
do
    if test -f $f1
    then
        for f2 in `find ./ -name "*.txt"`
        do
            if [ -f $f2 ] && [ "$f1" != "$f2" ]
            then
                # if cmp $f1 $f2 &> /dev/null # DOES NOT WORK
                if cmp $f1 $f2
                then
                    rm $f2
                    echo "$f2 purged"
                fi 
            fi
        done
    fi 
done

我想将输出和stderr重定向到

/dev/null

，以避免将它们打印到屏幕上。。但是使用注释语句，此脚本无法按预期工作，并删除除第一个文件以外的所有文件

如果需要，我会提供更多的信息

谢谢

是bash语法，您需要将shebang行（第一行）更改为#/bin/bash（或bash的适当路径）

或者，如果您真的在使用Bourne Shell（

/bin/sh

），那么您必须使用旧式重定向，即

cmp ... >/dev/null 2>&1

另外，我认为

&>

只是在Bash4中引入的，因此如果您使用的是Bash3.X，那么您仍然需要老式的重定向

在bash语法中，您需要将shebang行（第一行）更改为!！/bin/bash（或bash的适当路径）

或者，如果您真的在使用Bourne Shell（

/bin/sh

），那么您必须使用旧式重定向，即

cmp ... >/dev/null 2>&1

另外，我认为

&>

只是在Bash4中引入的，因此如果您使用的是Bash3.X，那么您仍然需要老式的重定向

IHTH

很少有评论：

首先是：

for f1 in `find ./ -name "*.txt"`
do
    if test -f $f1
    then

与相同（仅查找扩展名为

txt

的普通文件）

更好的语法（仅限bash）是

最后，整体是错误的，因为如果文件名包含空格，则

f1

变量将不会获得完整的路径名。因此，改为

for

执行以下操作：

find ./ -type f -name "*.txt" -print | while read -r f1

正如@Sir Athos所指出的，文件名可以包含

\n

，因此最好使用

find . -type f -name "*.txt" -print0 | while IFS= read -r -d '' f1

第二：

再次使用

“$f1”

而不是

$f1

，因为

$f1

可以包含空格

第三：

进行N*N比较不是很有效。你应该为每个

txt

文件做一个校验和（md5或更好的sha256）。当校验和相同时，文件是重复的

如果您不信任校验和，只需比较具有相同校验和的文件。具有不同校验和的文件肯定不会重复。；）

进行校验和的速度很慢，所以您应该首先比较大小相同的ony文件。不同大小的文件不会重复

您可以跳过空的

txt文件

-它们都是重复的：）

因此，最终命令可以是：

find -not -empty -type f -name \*.txt -printf "%s\n" | sort -rn | uniq -d |\
xargs -I% -n1 find -type f -name \*.txt -size %c -print0 | xargs -0 md5sum |\
sort | uniq -w32 --all-repeated=separate

评论：

#find all non-empty file with the txt extension and print their size (in bytes)
find . -not -empty -type f -name \*.txt -printf "%s\n" |\

#sort the sizes numerically, and keep only duplicated sizes
sort -rn | uniq -d |\

#for each sizes (what are duplicated) find all files with the given size and print their name (path)
xargs -I% -n1 find . -type f -name \*.txt -size %c -print0 |\

#make an md5 checksum for them
xargs -0 md5sum |\

#sort the checksums and keep duplicated files separated with an empty line
sort | uniq -w32 --all-repeated=separate

现在，您只需编辑输出文件，然后决定要删除什么和保留什么文件。

几点注释：

首先是：

for f1 in `find ./ -name "*.txt"`
do
    if test -f $f1
    then

与相同（仅查找扩展名为

txt

的普通文件）

更好的语法（仅限bash）是

最后，整体是错误的，因为如果文件名包含空格，

f1

变量将不会获得完整的路径名。因此的

应该：
find ./ -type f -name "*.txt" -print | while read -r f1

正如@Sir Athos所指出的，文件名可以包含\n
，因此最好使用
find . -type f -name "*.txt" -print0 | while IFS= read -r -d '' f1

第二：
再次使用“$f1”
而不是$f1
，因为$f1
可以包含空格
第三：
进行N*N比较不是很有效。您应该为每个txt
文件进行校验和（md5或更好的sha256）。当校验和相同时，文件是重复的
如果您不信任校验和，只需比较具有相同校验和的文件即可。具有不同校验和的文件一定不会重复。；）
进行校验和的速度很慢，所以您应该首先比较大小相同的ony文件。不同大小的文件不是重复的
您可以跳过空的txt文件
-它们都是重复的：）
因此，最终命令可以是：
find -not -empty -type f -name \*.txt -printf "%s\n" | sort -rn | uniq -d |\
xargs -I% -n1 find -type f -name \*.txt -size %c -print0 | xargs -0 md5sum |\
sort | uniq -w32 --all-repeated=separate

评论：
#find all non-empty file with the txt extension and print their size (in bytes)
find . -not -empty -type f -name \*.txt -printf "%s\n" |\

#sort the sizes numerically, and keep only duplicated sizes
sort -rn | uniq -d |\

#for each sizes (what are duplicated) find all files with the given size and print their name (path)
xargs -I% -n1 find . -type f -name \*.txt -size %c -print0 |\

#make an md5 checksum for them
xargs -0 md5sum |\

#sort the checksums and keep duplicated files separated with an empty line
sort | uniq -w32 --all-repeated=separate

现在输出，您只需编辑输出文件，并决定要删除什么和保留什么文件。
这个答案归功于@kobame：这确实是一个注释，但用于格式
您不需要调用find两次，在find命令中打印出大小和文件名
find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
# find the files that have duplicate sizes
sort -n | uniq -Dw 8 | 
# strip off the size and get the md5 sum
cut -c 10- | xargs md5sum 

一个例子
$ cat a.txt
this is file a
$ cat b.txt
this is file b
$ cat c.txt
different contents 
$ cp a.txt d.txt
$ cp b.txt e.txt
$ find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
sort -n | uniq -Dw 8 | cut -c 10- | xargs md5sum 

要保留一个并删除其余的，我将通过管道将输出导入：
...  | awk '++seen[$1] > 1 {print $2}' | xargs echo rm

如果测试结果令人满意，请移除回声
像许多复杂的管道一样，包含换行符的文件名也会破坏它。
这个答案归功于@kobame：这确实是一个注释，但格式不同
您不需要调用find两次，在find命令中打印出大小和文件名
find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
# find the files that have duplicate sizes
sort -n | uniq -Dw 8 | 
# strip off the size and get the md5 sum
cut -c 10- | xargs md5sum 

一个例子
$ cat a.txt
this is file a
$ cat b.txt
this is file b
$ cat c.txt
different contents 
$ cp a.txt d.txt
$ cp b.txt e.txt
$ find . -not -empty -type f -name \*.txt -printf "%8s %p\n" |
sort -n | uniq -Dw 8 | cut -c 10- | xargs md5sum 

要保留一个并删除其余的，我将通过管道将输出导入：
...  | awk '++seen[$1] > 1 {print $2}' | xargs echo rm

如果测试结果令人满意，请移除回声
像许多复杂的管道一样，包含换行符的文件名会破坏它。
所有这些都是很好的答案，所以只有一个简短的建议：您可以安装并使用
fdupes -r .

从那个男人那里：
在给定路径中搜索重复文件。这些文件由
比较文件大小和MD5签名，然后逐字节比较
比较
由@Francesco添加
fdupes -rf . | xargs rm -f

用于删除重复项。（fdupes中的-f
省略了文件的第一次出现，因此只列出重复项）
所有不错的答案，因此只有一个简短的建议：您可以安装并使用
fdupes -r .

从那个男人那里：
在给定路径中搜索重复文件。这些文件由
比较文件大小和MD5签名，然后逐字节比较
比较
由@Francesco添加
fdupes -rf . | xargs rm -f

用于删除重复项。（fdupes中的-f
省略了文件的第一次出现，因此只列出重复项）
除了shebang之外，您的代码中还有一些潜在的错误（不带引号的变量、解析ls等）。如果您将代码粘贴到其中（它还会发现&>
错误），您可以自己看到它们。谢谢：）除了shebang之外，您的代码中还存在一些潜在的bug（无引号的变量、解析ls等），如果您粘贴