Bash 如何在shell脚本中处理同一文件夹的文件时将新文件放入文件夹_Bash_Shell

Bash 如何在shell脚本中处理同一文件夹的文件时将新文件放入文件夹

bash shell

Bash 如何在shell脚本中处理同一文件夹的文件时将新文件放入文件夹,bash,shell,Bash,Shell,在shell脚本的开头，我有一个FOR循环来扫描一个文件夹，看看那里是否有任何文件，如果有，我需要处理每个文件。根据文件夹中的文件数量，每个文件的处理过程都需要一些时间（比如几分钟）问题是：在处理每个文件的过程中，可能会有新文件进入文件夹，但我的测试显示，新文件没有被拾取和处理。那么，有没有一种方法可以检测在FOR循环处理过程中传入的新文件我想定期检查文件夹中的新文件，但我不想再次重新处理现有文件，更重要的是，因为这只是脚本的开头，我不想for循环重复太多次。谢谢**** for aFile

在shell脚本的开头，我有一个FOR循环来扫描一个文件夹，看看那里是否有任何文件，如果有，我需要处理每个文件。根据文件夹中的文件数量，每个文件的处理过程都需要一些时间（比如几分钟）

问题是：在处理每个文件的过程中，可能会有新文件进入文件夹，但我的测试显示，新文件没有被拾取和处理。那么，有没有一种方法可以检测在FOR循环处理过程中传入的新文件

我想定期检查文件夹中的新文件，但我不想再次重新处理现有文件，更重要的是，因为这只是脚本的开头，我不想for循环重复太多次。谢谢****

for aFile in  "$mydir"/*
do
   // some tasks that may take 30 secs or so to finish for each file    
done

像这样的东西怎么样：

#!/bin/sh -xe

# create some dummy files to start with
touch filea
touch fileb

function analyzeFile() {
    echo "analyzing $1"
    sleep 10    # dummy for the real stuff you need to do
}

declare stillGettingSomething
declare -A alreadyAnalyzed

stillGettingSomething=true
while [ $stillGettingSomething ]; do
    stillGettingSomething=false    # prevent endless looping

    for i in ./file*; do
        # idea: see also http://superuser.com/questions/195598/test-if-element-is-in-array-in-bash 

        if [[ ${alreadyAnalyzed[$i]} ]]; then
            echo "$i was already analyzed before; skipping it immediately"
            continue
        fi

        alreadyAnalyzed[$i]=true    # Memorize the file which we visited
        stillGettingSomething=true  # We found some new file; we have to run another scan iteration later on

        analyzeFile $i

        # create some new files for the purpose of demonstration
        echo "creating file $i-latecreate"
        touch $i-latecreate
    done

done

此脚本的结果是

+ declare stillGettingSomething
+ declare -A alreadyAnalyzed
+ stillGettingSomething=true
+ '[' true ']'
+ stillGettingSomething=false
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./filea
+ echo 'analyzing ./filea'
analyzing ./filea
+ sleep 10
+ echo 'creating file ./filea-latecreate'
creating file ./filea-latecreate
+ touch ./filea-latecreate
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./fileb
+ echo 'analyzing ./fileb'
analyzing ./fileb
+ sleep 10
+ echo 'creating file ./fileb-latecreate'
creating file ./fileb-latecreate
+ touch ./fileb-latecreate
+ '[' true ']'
+ stillGettingSomething=false
+ for i in './file*'
+ [[ -n true ]]
+ echo './filea was already analyzed before; skipping it immediately'
./filea was already analyzed before; skipping it immediately
+ continue
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./filea-latecreate
+ echo 'analyzing ./filea-latecreate'
analyzing ./filea-latecreate
+ sleep 10

其背后的想法是使用一个关联数组，它存储那些已经处理过的文件。如果一个文件已经被处理过，我们下次跳过它时会跳过它。只要在扫描迭代中至少获得一个新文件，我们就可以这样做

编辑：清理编码这里是上述编码的一个改进版本，删减了演示用途的编码，试图尽可能接近原始需求

#!/bin/sh

function analyzeFile() {
    echo "analyzing $1"
    sleep 10    # dummy for the real stuff you need to do
}

declare stillGettingSomething
declare -A alreadyAnalyzed

stillGettingSomething=true
while [ $stillGettingSomething ]; do
    stillGettingSomething=false    # prevent endless looping

    for i in "$mydir"/*; do 

        if [[ ${alreadyAnalyzed[$i]} ]]; then
            echo "$i was already analyzed before; skipping it immediately"
            continue
        fi

        alreadyAnalyzed[$i]=true    # Memorize the file which we visited
        stillGettingSomething=true  # We found some new file; we have to run another scan iteration later on

        analyzeFile $i
    done
done

像这样的东西怎么样：

#!/bin/sh -xe

# create some dummy files to start with
touch filea
touch fileb

function analyzeFile() {
    echo "analyzing $1"
    sleep 10    # dummy for the real stuff you need to do
}

declare stillGettingSomething
declare -A alreadyAnalyzed

stillGettingSomething=true
while [ $stillGettingSomething ]; do
    stillGettingSomething=false    # prevent endless looping

    for i in ./file*; do
        # idea: see also http://superuser.com/questions/195598/test-if-element-is-in-array-in-bash 

        if [[ ${alreadyAnalyzed[$i]} ]]; then
            echo "$i was already analyzed before; skipping it immediately"
            continue
        fi

        alreadyAnalyzed[$i]=true    # Memorize the file which we visited
        stillGettingSomething=true  # We found some new file; we have to run another scan iteration later on

        analyzeFile $i

        # create some new files for the purpose of demonstration
        echo "creating file $i-latecreate"
        touch $i-latecreate
    done

done

此脚本的结果是

+ declare stillGettingSomething
+ declare -A alreadyAnalyzed
+ stillGettingSomething=true
+ '[' true ']'
+ stillGettingSomething=false
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./filea
+ echo 'analyzing ./filea'
analyzing ./filea
+ sleep 10
+ echo 'creating file ./filea-latecreate'
creating file ./filea-latecreate
+ touch ./filea-latecreate
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./fileb
+ echo 'analyzing ./fileb'
analyzing ./fileb
+ sleep 10
+ echo 'creating file ./fileb-latecreate'
creating file ./fileb-latecreate
+ touch ./fileb-latecreate
+ '[' true ']'
+ stillGettingSomething=false
+ for i in './file*'
+ [[ -n true ]]
+ echo './filea was already analyzed before; skipping it immediately'
./filea was already analyzed before; skipping it immediately
+ continue
+ for i in './file*'
+ [[ -n '' ]]
+ alreadyAnalyzed[$i]=true
+ stillGettingSomething=true
+ analyzeFile ./filea-latecreate
+ echo 'analyzing ./filea-latecreate'
analyzing ./filea-latecreate
+ sleep 10

编辑：清理编码这里是上述编码的一个改进版本，删减了演示用途的编码，试图尽可能接近原始需求

#!/bin/sh

function analyzeFile() {
    echo "analyzing $1"
    sleep 10    # dummy for the real stuff you need to do
}

declare stillGettingSomething
declare -A alreadyAnalyzed

stillGettingSomething=true
while [ $stillGettingSomething ]; do
    stillGettingSomething=false    # prevent endless looping

    for i in "$mydir"/*; do 

        if [[ ${alreadyAnalyzed[$i]} ]]; then
            echo "$i was already analyzed before; skipping it immediately"
            continue
        fi

        alreadyAnalyzed[$i]=true    # Memorize the file which we visited
        stillGettingSomething=true  # We found some new file; we have to run another scan iteration later on

        analyzeFile $i
    done
done

这是一个有趣的问题，有很多方法可以解决它。一种方法是以某种方式跟踪哪些文件已完成，然后在每次循环迭代中处理第一个未完成的文件，例如

cd "$mydir"
# make a donedir to put placeholder dummy files
mkdir donedir

while true; do

  # find first file with no corresponding dummy file in donedir
  newfile=`find * -maxdepth 0 -type f |
    sed 's/.*/[ ! -f "../donedir/&" ] \&\& echo "&"/' |
    sh | head -n1`

  # break out of the loop if there aren't any
  [ "$newfile" = "" ] && break

  # do your thing with $newfile...

  # record that you're done with $newfile
  touch "donedir/$newfile"
done

一种更有效的策略是，在处理完每个文件后，只需将其移动到donedir即可：

cd "$mydir"
mkdir donedir

while true; do

  # find first file
  newfile=`find * -maxdepth 0 -type f | head -n1`

  # break out of the loop if there aren't any
  [ "$newfile" = "" ] && break

  # do your thing with $newfile...

  # done with $newfile...
  mv "$newfile" donedir
done

人们还可以跟踪哪些文件是用关联数组完成的，但这种方法的缺点是1。不必要的复杂性，以及2。跟踪完成的文件不会在脚本的不同运行期间自动保留。

这是一个有趣的问题，有很多方法可以解决它。一种方法是以某种方式跟踪哪些文件已完成，然后在每次循环迭代中处理第一个未完成的文件，例如

cd "$mydir"
# make a donedir to put placeholder dummy files
mkdir donedir

while true; do

  # find first file with no corresponding dummy file in donedir
  newfile=`find * -maxdepth 0 -type f |
    sed 's/.*/[ ! -f "../donedir/&" ] \&\& echo "&"/' |
    sh | head -n1`

  # break out of the loop if there aren't any
  [ "$newfile" = "" ] && break

  # do your thing with $newfile...

  # record that you're done with $newfile
  touch "donedir/$newfile"
done

一种更有效的策略是，在处理完每个文件后，只需将其移动到donedir即可：

cd "$mydir"
mkdir donedir

while true; do

  # find first file
  newfile=`find * -maxdepth 0 -type f | head -n1`

  # break out of the loop if there aren't any
  [ "$newfile" = "" ] && break

  # do your thing with $newfile...

  # done with $newfile...
  mv "$newfile" donedir
done

人们还可以跟踪哪些文件是用关联数组完成的，但这种方法的缺点是1。不必要的复杂性，以及2。跟踪完成的文件不会在脚本的不同运行期间自动保留。

谢谢。但是当while循环结束时。对我来说，这是一个不确定的循环。为了演示的目的，这只是一个无休止的循环。只需删除两行“echo”creating file$i-latecreate”和“touch$i-latecreate”“在循环中，如果您添加更多文件，它将起作用。谢谢。但是当while循环结束时。对我来说，这是一个不确定的循环。为了演示的目的，这只是一个无休止的循环。只需删除两行“echo”creating file$i-latecreate”和“touch$i-latecreate”“在循环中，如果您添加更多文件，它将起作用。谢谢。但是while循环似乎是一个无限循环。正如我所说，这只是脚本的开始，我不能在那里有一个无限循环。

[“$newfile”=”]&&break

在没有新文件时就会中断循环，而

#do your thing with$newfile

是对每个新文件进行所有处理的地方。谢谢。但是while循环似乎是一个无限循环。正如我所说的，这只是脚本的开始，我不能在那里有一个无限循环。

[“$newfile”=”]&&break

一旦没有新文件，循环就会中断，而

#使用$newfile做你的事

是对每个新文件进行所有处理的地方。你是否考虑过重新构造以使用fsnotifywait？我有一个bash脚本位于无限循环中，它捕获信号。我让另一个bash脚本在给定目录上运行fsnotifywait，并在发生正确的文件系统事件时向另一个bash脚本发送信号。您是否考虑过重新构造以使用fsnotifywait？我有一个bash脚本位于无限循环中，它捕获信号。我让另一个bash脚本在给定目录上运行fsnotifywait，并在发生正确的文件系统事件时向另一个bash脚本发送信号。