在shell中对此脚本的修改_Shell

在shell中对此脚本的修改

shell

在shell中对此脚本的修改,shell,Shell,例如，我有一个脚本，它需要读取coulmn中的所有字段并进行验证，然后才能命中第二列 Name, City Joe, Orlando Sam, Copper Town Mike, Atlanta 因此，脚本应该读取整个name列（从上到下），并在移动到第二列之前验证null。它不应该逐行阅读。请添加一些关于如何修改/更正的指针 # Read all files. no file have spaces in their names for file in /export/home/*

例如，我有一个脚本，它需要读取coulmn中的所有字段并进行验证，然后才能命中第二列

Name, City

Joe, Orlando
Sam, Copper Town
Mike, Atlanta

因此，脚本应该读取整个name列（从上到下），并在移动到第二列之前验证null。它不应该逐行阅读。请添加一些关于如何修改/更正的指针

 # Read all files.  no file have spaces in their names


for file in /export/home/*.csv ; do
  # init two variables before processing a new file
 $date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';
 FILESTATUS=GOOD
 FIRSTROW=true
# process file 1 line a time, splitting the line by the
# Internal Field Sep ,
 cat "${file}" | while IFS=, read field1 field2 field3 field4; do
  # Skip first line, the header row

  if [ "${FIRSTROW}" = "true" ]; then
     FIRSTROW=FALSE
     # skip processing of this line, continue with next record
     continue;
  fi

  #different validations
  if [[ ! -n "$field1" ]]; then
  ${FILESTATUS}=BAD
     # Stop inner loop
     break
  fi
  #somecheckonField2
      if [[ ! -n "$field2"]]  && ("$field2" =~ $date_regex) ; then
     ${FILESTATUS}=BAD
     # Stop inner loop
     break
  fi

      if [[ ! -n "$field3" ]] && (("$field3" != "S") || ("$field3" != "E")); then
     ${FILESTATUS}=BAD
     # Stop inner loop
     break
  fi

      if [[ ! -n "$field4" ]] || (( ${#field4} < 9 || ${#field4} > 11 )); then
     ${FILESTATUS}=BAD
     # Stop inner loop
     break
  fi


done

 if [ ${FILESTATUS} = "GOOD" ] ; then

  mv ${file} /export/home/goodFile


 else
  mv ${file} /export/home/badFile
fi

#读取所有文件。文件名中没有空格
对于/export/home/*.csv中的文件；做
#在处理新文件之前初始化两个变量
$date|regex='（0[1-9]| 1[012]）[-/.]（0[1-9]|[12][0-9]| 3[01]）[-/.]（19[124; 20）\d\d~；
FILESTATUS=好
第一行=真
#每次处理文件1行，按
#内场Sep，
cat“${file}”|当IFS=时，读取字段1字段2字段3字段4；做
#跳过第一行，标题行
如果[“${FIRSTROW}”=“true”]；然后
FIRSTROW=FALSE
#跳过此行的处理，继续下一条记录
继续；
fi
#不同的验证
如果[！-n“$field1”]；然后
${FILESTATUS}=错误
#止动内环
打破
fi
#somecheckonField2
如果[！-n“$field2”]&（“$field2”=~$date\u regex）；然后
${FILESTATUS}=错误
#止动内环
打破
fi
如果[！-n“$field3”]&（$field3”！=“S”）| |（$field3”！=“E”）；然后
${FILESTATUS}=错误
#止动内环
打破
fi
如果[！-n“$field4”]| |（${#field4}<9|${#field4}>11））；然后
${FILESTATUS}=错误
#止动内环
打破
fi
完成
如果[${FILESTATUS}=“GOOD”]；然后
mv${file}/export/home/goodFile
其他的
mv${file}/export/home/badFile
fi

完成

以下是对

awk

脚本的一次尝试，该脚本完成了原始脚本试图完成的操作：

#!/usr/bin/awk -f

# fields separated by commas
BEGIN { FS = "," }

# skip first line
NR == 1 { next }

# check for empty fields
$1 == "" || $2 == "" || $3 == "" || $4 == "" { exit 1 }

# check for "valid" date (urk... doing this with a regex is horrid)
# it would be better to split it into components and validate each sub-field,
# but I'll leave that as a learning exercise for the reader
$2 !~ /^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9][0-9]$/ { exit 1 }

# third field should be either S or E
$3 !~ /^[SE]$/ { exit 1 }

# check the length of the fourth field is between 9 and 11
length($4) < 9 || length($4) > 11 { exit 1 }

# if we haven't found problems up to here, then things are good
END { exit 0 }

此awk将读取整个文件，然后您可以在结束块中进行验证：

for file in /export/home/*.csv ; do
    awk -F', ' '
        # skip the header and blank lines
        NR == 1 || NF == 0 {next}

        # save the data
        { for (i=1; i<=NF; i++) data[++nr,i] = $i }

        END {
            status = "OK"

            # verify column 1
            for (lineno=1; lineno <= nr; lineno++) {
                if (length(data[lineno,1]) == 0) {
                    status = "BAD" 
                    break
                }
            }
            printf "file: %s, verify column 1, status: %s\n", FILENAME, status

            # verify other columns ...
        }
    ' "$file"
done

用于/export/home/*.csv中的文件；做
awk-F'，'
#跳过标题和空行
NR==1 | | NF==0{next}
#保存数据
{for（i=1；i）除非将所有字段中的数据存储在某个位置，否则按列处理将需要对整个文件进行N次读取（每列一次）。您需要这样做有什么原因吗？如果脚本要检查数百万行，时间因素会有所帮助…如果我们先检查整个列是否存在错误条目，我们可以首先找到错误并减少验证所需的时间，除非验证前面的列比验证后面的列快几个数量级，否则我不知道我不相信。从开始到结束重复读取整个文件所花费的时间是非常实际的（和工作量）。您甚至无法在第一列（“Sam”）中找到第二个值没有读取第一行的全部剩余部分。文件是线性的。除非行大小固定，否则可以计算下一行的起始位置。但是，您可以通过awk-F'{print$1}提取第一列“
。它仍在阅读整个过程，不过…@twalberg您能将其集成到上面的脚本中吗？或者给我一些关于如何实现awk的指针吗？谢谢您，这将非常有帮助！我如何才能到达第2列开始检查，即如何/如何检查其他Coulmn？Usedata[lineno，2]
，data[lineno，3]，等等谢谢！：）最后一个问题，先生，for..循环中的中断。它实际中断了哪个循环？那里只有一个循环。我指的是/export/home/*.csv中的文件；执行和
for file in /export/home/*.csv ; do
    awk -F', ' '
        # skip the header and blank lines
        NR == 1 || NF == 0 {next}

        # save the data
        { for (i=1; i<=NF; i++) data[++nr,i] = $i }

        END {
            status = "OK"

            # verify column 1
            for (lineno=1; lineno <= nr; lineno++) {
                if (length(data[lineno,1]) == 0) {
                    status = "BAD" 
                    break
                }
            }
            printf "file: %s, verify column 1, status: %s\n", FILENAME, status

            # verify other columns ...
        }
    ' "$file"
done