Perl 如何找出shell中的列类型并获得差异_Perl_Shell_Python 2.7_Python 3.x_Awk

Perl 如何找出shell中的列类型并获得差异

perl shell python-2.7 python-3.x awk

Perl 如何找出shell中的列类型并获得差异,perl,shell,python-2.7,python-3.x,awk,Perl,Shell,Python 2.7,Python 3.x,Awk,我有一个样本文件，看起来像 emp_id(int),name(string),age(int) 1,hasa,34 2,dafa,45 3,fasa,12 8f,123Rag,12 8,fafl,12 要求：列数据类型指定为字符串和整数。Emp_id应该是整数而不是字符串。对于名称和年龄列，这些条件将是相同的我的输出应该像# 继续这是我的密码 Shell脚本 read input if [ $input -eq $input 2>/dev/null ] then echo

我有一个样本文件，看起来像

emp_id(int),name(string),age(int)
1,hasa,34
2,dafa,45
3,fasa,12
8f,123Rag,12
8,fafl,12

要求：列数据类型指定为字符串和整数。Emp_id应该是整数而不是字符串。对于名称和年龄列，这些条件将是相同的

我的输出应该像#

继续

这是我的密码 Shell脚本

read input
if [ $input -eq $input 2>/dev/null ]
then
     echo "$input is an integer"
else
    echo "$input is not an integer"
fi

在python中，我尝试使用Isinstance（obj，type），但它没有达到目的。

如果您能在这方面给我一些指导，欢迎您提供shell/python/perl脚本帮助

以下是一个awk解决方案：

awk -F"," 'NR==1{for(i=1; i <= NF; i++){
                        split($i,a,"(");
                        name[i]=a[1]; 
                        type[i] = ($i ~ "int" ? "INT" : "String")}next}
           {for(i=1; i <= NF; i++){
               if($i != int($i) && type[i] == "INT"){error[i][NR] = $i}
               if($i ~ /[0-9]+/ && type[i] == "String"){error[i][NR] = $i}
           }}
           END{for(i in error){
                       for(key in error[i]){
                            print "Actual column "name[i]" type is "type[i]\
                                  " but string was found at the position "key-1\
                                  ", value is "error[i][key]}}}' inputFile

但是，在我看来，

123Rag

是一个字符串，不应在第二列中表示为不正确的条目。

使用

perl

我将这样处理它：

定义一些与字符串内容匹配/不匹配的正则表达式模式
选择标题行-将其分为名称和类型。（如果类型不匹配，可以选择报告）
迭代字段，按列匹配，找出类型并应用正则表达式进行验证

比如：

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

#define regex to apply for a given data type
my %pattern_for = (
    int    => qr/^\d+$/,
    string => qr/^[A-Z]+$/i,
);

print Dumper \%pattern_for;

#read the first line. 
# <> is a magic filehandle, that reads files specified as arguments 
# or piped input - like grep/sed do. 
my $header_row = <>;
#extract just the names, in order. 
my @headers = $header_row =~ m/(\w+)\(/g;
#create a type lookup for the named headers. 
my %type_for = $header_row =~ m|(\w+)\((\w+)\)|g;

print Dumper \@headers;
print Dumper \%type_for;

#iterate input again
while (<>) {
    #remove trailing linefeed
    chomp;

    #parse incoming data into named fields based on ordering. 
    my %fields;
    @fields{@headers} = split /,/;
    #print for diag
    print Dumper \%fields;

    #iterate the headers, applying the looked up 'type' regex
    foreach my $field_name (@headers) {
        if ( $fields{$field_name} =~ m/$pattern_for{$type_for{$field_name}}/ ) {
            print
                "$field_name => $fields{$field_name} is valid, $type_for{$field_name} matching $pattern_for{$type_for{$field_name}}\n";
        }
        else {
            print "$field_name $fields{$field_name} not valid $type_for{$field_name} matching $pattern_for{$type_for{$field_name}}\n";
        }
    }
}

注意-它只支持“简单”CSV样式（没有嵌套的逗号或引号），但可以很容易地调整为使用

Text:：CSV

模块

您的代码与您的需求无关。至少表现出诚实的尝试。在字符串字段中输入数字有什么问题吗？您的INT测试是错误的：值

1.1

将通过。这更好：

$i！=整数（$i）

。格伦·杰克曼：是的，你当然是对的！

$i==$i+0

测试该值是否为数字（int或double无关紧要）。我不知怎么忘记了

int

-限制。

Actual column emp_id type is INT but string was found at the position 4, value is 8f
Actual column name type is String but string was found at the position 4, value is 123Rag

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

#define regex to apply for a given data type
my %pattern_for = (
    int    => qr/^\d+$/,
    string => qr/^[A-Z]+$/i,
);

print Dumper \%pattern_for;

#read the first line. 
# <> is a magic filehandle, that reads files specified as arguments 
# or piped input - like grep/sed do. 
my $header_row = <>;
#extract just the names, in order. 
my @headers = $header_row =~ m/(\w+)\(/g;
#create a type lookup for the named headers. 
my %type_for = $header_row =~ m|(\w+)\((\w+)\)|g;

print Dumper \@headers;
print Dumper \%type_for;

#iterate input again
while (<>) {
    #remove trailing linefeed
    chomp;

    #parse incoming data into named fields based on ordering. 
    my %fields;
    @fields{@headers} = split /,/;
    #print for diag
    print Dumper \%fields;

    #iterate the headers, applying the looked up 'type' regex
    foreach my $field_name (@headers) {
        if ( $fields{$field_name} =~ m/$pattern_for{$type_for{$field_name}}/ ) {
            print
                "$field_name => $fields{$field_name} is valid, $type_for{$field_name} matching $pattern_for{$type_for{$field_name}}\n";
        }
        else {
            print "$field_name $fields{$field_name} not valid $type_for{$field_name} matching $pattern_for{$type_for{$field_name}}\n";
        }
    }
}

name 123Rag not valid string matching (?^i:^[A-Z]+$)
emp_id 8f not valid int matching (?^:^\d+$)