Bash 在特定作业上运行SHell脚本时，如何从HDFS获取最新的有效分区日期？_Bash_Shell_Apache Spark_Hadoop

Bash 在特定作业上运行SHell脚本时，如何从HDFS获取最新的有效分区日期？

bash shell apache-spark hadoop

Bash 在特定作业上运行SHell脚本时，如何从HDFS获取最新的有效分区日期？,bash,shell,apache-spark,hadoop,Bash,Shell,Apache Spark,Hadoop,我的任务是为特定的Spark作业实现所有分配的表。我需要根据时间戳和要为分配给作业的所有表打印的路径编写脚本。我需要获取与该作业关联的表关联的所有时间戳这是我开发的脚本 #!/usr/bin/env bash JOB_NAME=${1} inputDirListings=$(awk -F: -v key="$1" '$1==key {print $2}' test_paths.txt) for dir in $(echo $inputDirListings | tr "," "\n");

我的任务是为特定的Spark作业实现所有分配的表。我需要根据时间戳和要为分配给作业的所有表打印的路径编写脚本。我需要获取与该作业关联的表关联的所有时间戳

这是我开发的脚本

#!/usr/bin/env bash
JOB_NAME=${1}
 inputDirListings=$(awk -F: -v key="$1" '$1==key {print $2}' test_paths.txt)
for dir in  $(echo $inputDirListings | tr "," "\n");
do
    path=$dir
    echo "dir is $path"
    cmd2='hdfs dfs -du -h $path'
    ev1=`eval $cmd2 | tail -1`
    echo "ev1 value is $ev1"

    hdfsPath=`echo $ev1 | cut -d";" -f3- `
    echo "partition is $hdfsPath"

    latestPartition=`echo $hdfsPath | grep -Eo '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'`
    echo "latest partition is $latestPartition"

    dt1="$(echo $ev1 | cut -d'=' -f2)"
    arr[i]=`date -d $dt1 +%Y%m%d`

    #---Getting minimum date from array---------
    max=${arr[0]}
    min=${arr[0]}

    for i in ${arr[@]}
    do
    if [[ $i > $max ]] ; then                           
    max=$i-1
    fi
    if [[ $i < $min ]] ; then
    min=$i
    fi
    echo "dt1"
    for (( c=$dt1; c<=$currDate; c++ ))
    do
        echo -n "$c "
        sleep 1
    done 
done
 echo "Max value is $max  , minimal value is $min"
dt2=`date -d $min +%Y-%m-%d`
done

#/usr/bin/env bash
作业名称=${1}
inputDirListings=$（awk-F:-v key=“$1”'$1==key{print$2}”test_path.txt）
对于$（echo$inputDirListings | tr“，”\n“）中的目录；
做
路径=$dir
echo“dir是$path”
cmd2='hdfs dfs-du-h$path'
ev1=`eval$cmd2 | tail-1`
echo“ev1值为$ev1”
hdfsPath=`echo$ev1 | cut-d”；“-f3-`
echo“分区为$hdfsPath”
latestPartition=`echo$hdfsPath | grep-Eo'[:digit:]{4}-[:digit:]{2}-[:digit:]{2}'`
echo“最新分区为$latestPartition”
dt1=“$（echo$ev1 | cut-d'='-f2）”
arr[i]=`date-d$dt1+%Y%m%d`
#---从数组中获取最小日期---------
max=${arr[0]}
min=${arr[0]}
对于${arr[@]}中的i
做
如果[[$i>$max]]；然后
最高=$i-1
fi
如果[[$i<$min]]；然后
最小值=$i
fi
回声“dt1”
对于（（c=$dt1；c您的代码只将最后一个目录中的分区值存储到数组中，因为它在循环中每次都被覆盖
您的数组需要在循环外定义，i
需要在循环内递增，并且您需要取出内部循环，如下所示：
#!/usr/bin/env bash
JOB_NAME=${1}
arr=()
i=0

inputDirListings=$(awk -F: -v key="$1" '$1==key {print $2}' test_paths.txt)
for dir in  $(echo $inputDirListings | tr "," "\n");
do
    path=$dir
    echo "dir is $path"
    cmd2='hdfs dfs -du -h $path'
    ev1=`eval $cmd2 | tail -1`
    echo "ev1 value is $ev1"

    hdfsPath=`echo $ev1 | cut -d";" -f3- `
    echo "partition is $hdfsPath"

    latestPartition=`echo $hdfsPath | grep -Eo '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'`
    echo "latest partition is $latestPartition"

    dt1="$(echo $ev1 | cut -d'=' -f2)"
    arr[i]=`date -d $dt1 +%Y%m%d`

    let "i++"
echo "Max value is $max  , minimal value is $min"
dt2=`date -d $min +%Y-%m-%d`
done

#---Getting minimum date from array---------
max=${arr[0]}
min=${arr[0]}

for i in ${arr[@]}
do
    if [[ $i > $max ]] ; then                           
        max=$i-1
    fi
    if [[ $i < $min ]] ; then
        min=$i
    fi
    echo "dt1"
    for (( c=$dt1; c<=$currDate; c++ ))
    do
        echo -n "$c "
        sleep 1
    done 
done

！/usr/bin/env bash
作业名称=${1}
arr=（）
i=0
inputDirListings=$（awk-F:-v key=“$1”'$1==key{print$2}”test_path.txt）
对于$（echo$inputDirListings | tr“，”\n“）中的目录；
做
路径=$dir
echo“dir是$path”
cmd2='hdfs dfs-du-h$path'
ev1=`eval$cmd2 | tail-1`
echo“ev1值为$ev1”
hdfsPath=`echo$ev1 | cut-d”；“-f3-`
echo“分区为$hdfsPath”
latestPartition=`echo$hdfsPath | grep-Eo'[:digit:]{4}-[:digit:]{2}-[:digit:]{2}'`
echo“最新分区为$latestPartition”
dt1=“$（echo$ev1 | cut-d'='-f2）”
arr[i]=`date-d$dt1+%Y%m%d`
让“i++”
echo“最大值为$Max，最小值为$min”
dt2=`date-d$min+%Y-%m-%d`
完成
#---从数组中获取最小日期---------
max=${arr[0]}
min=${arr[0]}
对于${arr[@]}中的i
做
如果[[$i>$max]]；则
最高=$i-1
fi
如果[[$i<$min]]；则
最小值=$i
fi
回声“dt1”
对于（（c=$dt1；c）