Arrays 如何在perl中从二维数组中删除冗余项（类似项）_Arrays_List_Perl_Awk

Arrays 如何在perl中从二维数组中删除冗余项（类似项）

arrays list perl awk

Arrays 如何在perl中从二维数组中删除冗余项（类似项）,arrays,list,perl,awk,Arrays,List,Perl,Awk,我对perl相当陌生，但到目前为止，我发现它是一种非常强大的语言。每个月我都会从我管理的产品的许可证注册中提取一份摘录，数据是CSV格式的。我已经完成了代码，得到了一个排序列表，并且按照我的要求进行了排序。这个列表大约有1200行列表的格式如下（我只保留了重要部分）：对于上述情况，需要的输出如下所示： Customer;CustomerID;ProductLine;Platform;Version operatorx;1234;XX;Linux;15 operatory;2345;YY;

我对perl相当陌生，但到目前为止，我发现它是一种非常强大的语言。每个月我都会从我管理的产品的许可证注册中提取一份摘录，数据是CSV格式的。我已经完成了代码，得到了一个排序列表，并且按照我的要求进行了排序。这个列表大约有1200行

列表的格式如下（我只保留了重要部分）：

对于上述情况，需要的输出如下所示：

Customer;CustomerID;ProductLine;Platform;Version
operatorx;1234;XX;Linux;15
operatory;2345;YY;x86;8
operatory;2345;ZZ;x86;7.2

代码中的我的列表不包含任何“；”，这些值存储在如下数组中：

@sortedlist = ([Customer,customerID,ProductLine,Platform,Version])

因此，任何客户在我的原始列表中都可以有许多行，但是如果产品是XX，那么只应保留列表中的第一个出现的行，并且不能保留出现的产品YY或ZZ。如果客户没有产品XX，则应保留第一次出现的产品YY和第一次出现的产品ZZ

对列表进行排序，以便“最佳”条目始终是每个customerID的第一个条目

我尝试了一个非常简单的代码，检查当前customerID！=prevCustomerID然后将该行推到一个新列表，但这使我错过了当客户同时拥有YY和ZZ产品时。。。我还尝试嵌套了很多if语句，试图跟踪当前行和前一行。。。但是代码增长了很多，仍然没有给我预期的结果：-(

我开始认为我从错误的角度来处理这个问题，我试图深入研究散列，但是由于一个客户实际上可以在最终列表中有一个或两个条目，我认为散列是不合格的，因为这里的键值必须是customerID，在散列中，每个customerID应该只有一个出现

有人知道如何解决这个问题吗？从顶部开始，将第一个元素推送到一个新列表中，然后对于每个连续的行，检查它是否存在于新列表中，以及新列表包含什么产品，如果product==XX，则为相同的customerID放弃其余元素，或者如果新列表中的product==YY，则放弃其余元素，直到找到相同customerID的product==ZZ为止。然后重复同样的操作，直到找到新的customerID

---更新--- 我设法用awk解决了我的问题

./myperlscript.pl input.csv | awk -F ';' '!array[$1,$2,$3]++'| awk -F ';' '{ {if ($2 != prev) {print $0; prev = $2; prevprod = $3}} {if ($2 = prev && prevprod != "XX") { prev =$2}}} > output.csv

但是如果有人知道如何用标准perl实现同样的效果，那就太好了。

这里有一个简单的perl实现，使用状态变量可以实现同样的结果。如果您实际上要考虑3个以上的变量（这里是XX、YY、ZZ），您可以将其概括为一个状态数组和一个函数，该函数更新数组并根据数组的状态决定要执行的操作

filter.pl

#!/usr/bin/env perl

use warnings;
use strict;

my $last_customer = '';
my ($seen_xx, $seen_yy, $seen_zz);

while (my $line = <>) {
    # Header
    if ($. == 1) {
        print $line;
        next;
    }

    # Data
    my ($customer_name, $customer_id, $product_line, $platform, $version) = split /;/, $line;
    die "Unable to parse line : $line"
        unless defined $customer_name;

    if ($customer_name ne $last_customer) {
        $last_customer = $customer_name;
        ($seen_xx, $seen_yy, $seen_zz) = (0,0,0); # Reset
    }

    if (not $seen_xx and $product_line eq 'XX') {
        # Print first XX
        print $line;
        ($seen_xx, $seen_yy, $seen_zz) = (1,1,1); # Ignore the others
    }

    if (not $seen_yy and $product_line eq 'YY') {
        # Print first YY if no XX
        print $line;
        $seen_yy = 1;
    }

    if (not $seen_zz and $product_line eq 'ZZ') {
        # Print first ZZ if no XX
        print $line;
        $seen_zz = 1;
    }
}

如果你发布到目前为止的代码，你会得到更好的答案。同时也解释为什么你人为地限制解决方案空间不包括模块。但是，你可以有数组的散列，和散列的数组。或散列的散列。或复合散列键。欢迎这么做！请阅读此内容并按照指南包含足够的内容nt信息来描述您的问题。Sobrique：不想使用模块的原因是我想与同事分享我的最终代码，我们无法在现有的设置中使用CPAN…我们的主环境基于Windows，我们必须使用Cygwin来运行此功能，而我们必须使用的Cygwin设置不允许添加模块：-(

#!/usr/bin/env perl

use warnings;
use strict;

my $last_customer = '';
my ($seen_xx, $seen_yy, $seen_zz);

while (my $line = <>) {
    # Header
    if ($. == 1) {
        print $line;
        next;
    }

    # Data
    my ($customer_name, $customer_id, $product_line, $platform, $version) = split /;/, $line;
    die "Unable to parse line : $line"
        unless defined $customer_name;

    if ($customer_name ne $last_customer) {
        $last_customer = $customer_name;
        ($seen_xx, $seen_yy, $seen_zz) = (0,0,0); # Reset
    }

    if (not $seen_xx and $product_line eq 'XX') {
        # Print first XX
        print $line;
        ($seen_xx, $seen_yy, $seen_zz) = (1,1,1); # Ignore the others
    }

    if (not $seen_yy and $product_line eq 'YY') {
        # Print first YY if no XX
        print $line;
        $seen_yy = 1;
    }

    if (not $seen_zz and $product_line eq 'ZZ') {
        # Print first ZZ if no XX
        print $line;
        $seen_zz = 1;
    }
}

cat input | perl filter.pl
Customer;CustomerID;ProductLine;Platform;Version
operatorx;1234;XX;Linux;15
operatory;2345;YY;x86;8
operatory;2345;ZZ;x86;7.2