Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/shell/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用于逐列验证csv文件的Shell脚本_Shell_Csv_Scripting - Fatal编程技术网

用于逐列验证csv文件的Shell脚本

用于逐列验证csv文件的Shell脚本,shell,csv,scripting,Shell,Csv,Scripting,我想知道我该如何在shell中写这篇文章?我想逐个验证csv文件coulmn中的字段。例如,只想验证coulmn No 1是否为No Number,Letter 1,u 2,h 3,d 4,j 在上面 下面是一行一行的验证,我需要做什么修改才能使它像上面的psuedo代码一样。我对unix很糟糕,我刚开始学习awk #!/bin/sh for file in /source/*.csv do awk -F"," '{ # awk -F"

我想知道我该如何在shell中写这篇文章?我想逐个验证csv文件coulmn中的字段。例如,只想验证coulmn No 1是否为No

Number,Letter

1,u
2,h
3,d
4,j
在上面

下面是一行一行的验证,我需要做什么修改才能使它像上面的psuedo代码一样。我对unix很糟糕,我刚开始学习awk

#!/bin/sh

for file in /source/*.csv

 do
   awk -F"," '{                       # awk -F", " {'print$2'} to get the     fields.
$date_regex = '~(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d~';

if (length($1) == "")  
    break
if (length($2) == "") && (length($2) > 30)
    break
if (length($3) == "") && ($3 !~ /$date_regex/)
    break
if (length($4) == "") && (($4 != "S") || ($4 != "E")   
    break
if (length($5) == "") && ((length($5) < 9 || (length($5) > 11)))
    break



}' file

   #whatever you need with "$file"
#/垃圾箱/垃圾箱
对于/source/*.csv中的文件
做
awk-F“,”{#awk-F“,“{'print$2'}来获取字段。
$date|regex='(0[1-9]| 1[012])[-/.](0[1-9]|[12][0-9]| 3[01])[-/.](19[124; 20)\d\d~;
如果(长度($1)==“”)
打破
如果(长度($2)==”)&(长度($2)>30)
打破
如果(长度($3)==“”)和($3!~/$date\u regex/)
打破
如果(长度($4)=“”)和($4!=“S”)| |($4!=“E”)
打破
如果(长度($5)==”)&((长度($5)<9| |(长度($5)>11)))
打破
}"档案"
#使用“$file”可以满足您的任何需要

完成

假设文件中没有多余的空白,下面是我在bash中的实现方法

# validate: first field is an integer
# validate: 2nd field is a lower-case letter

for file in *.csv; do
    good=true
    while IFS=, read -ra fields; do
        if [[ ! ( 
                  ${fields[0]} =~ ^[+-]?[[:digit:]]+$
                  && ${fields[1]} == [a-z]
                ) ]]
        then
            good=false
            break
        fi
    done < "$file"
    if $good; then
        : # handle good file
    else
        : # handle bad file
    fi
done
#验证:第一个字段是整数
#验证:第二个字段是小写字母
对于*.csv;do中的文件
好=真
当IFS=,read-ra字段;do
如果[!(
${fields[0]}=~^[+-]?[:digit:]+$
&&${fields[1]}==[a-z]
) ]]
然后
好=错
打破
fi
完成<“$file”
如果$good;那么
:#处理好文件
其他的
:#处理坏文件
fi
完成

我将结合两种不同的方法来编写循环。 以#开头的行是注释:

# Read all files. I hope no file have spaces in their names
for file in /source/*.csv ; do
   # init two variables before processing a new file
   FILESTATUS=GOOD
   FIRSTROW=true
   # process file 1 line a time, splitting the line by the 
   # Internal Field Sep ,
   cat "${file}" | while IFS=, read field1 field2; do
      # Skip first line, the header row
      if [ "${FIRSTROW}" = "true" ]; then
         FIRSTROW=FALSE
         # skip processing of this line, continue with next record
         continue;
      fi

      # Lot of different checks possible here
      # Can google them easy (check field integer)
      if [[ "${field1}" = somestringprefix*  ]]; then
         ${FILESTATUS}=BAD
         # Stop inner loop
         break
      fi
      somecheckonField2
   done
   if [ ${FILESTATUS} = "GOOD" ] ; then
      mv ${file} /source/good
   else
      mv ${file} /source/bad
   fi
done

有什么特别的原因需要用shell脚本而不是其他语言来完成吗?如果你擅长python或perl或其他语言,你可以用它来编写脚本。遗憾的是,我不得不用shell来实现它:/I我不知道这些类型的语句如何执行…
If(length($5)==“”)和($5)length($5)<9…
length应该返回0或一个正数,对吗?是的,
==“lenth=0时可能是真的,但是为什么还要麻烦测试
length($5)
# Read all files. I hope no file have spaces in their names
for file in /source/*.csv ; do
   # init two variables before processing a new file
   FILESTATUS=GOOD
   FIRSTROW=true
   # process file 1 line a time, splitting the line by the 
   # Internal Field Sep ,
   cat "${file}" | while IFS=, read field1 field2; do
      # Skip first line, the header row
      if [ "${FIRSTROW}" = "true" ]; then
         FIRSTROW=FALSE
         # skip processing of this line, continue with next record
         continue;
      fi

      # Lot of different checks possible here
      # Can google them easy (check field integer)
      if [[ "${field1}" = somestringprefix*  ]]; then
         ${FILESTATUS}=BAD
         # Stop inner loop
         break
      fi
      somecheckonField2
   done
   if [ ${FILESTATUS} = "GOOD" ] ; then
      mv ${file} /source/good
   else
      mv ${file} /source/bad
   fi
done