Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/linux/28.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Linux CSV根据大文件中的旧日期删除重复项(超过100k条记录)_Linux_Sorting_Csv - Fatal编程技术网

Linux CSV根据大文件中的旧日期删除重复项(超过100k条记录)

Linux CSV根据大文件中的旧日期删除重复项(超过100k条记录),linux,sorting,csv,Linux,Sorting,Csv,我们有以下CSV文件,其中包含 DCR_Path,Direction for Translation,Date & Time data1,Send for Translation To CTM,Sep 30 2014 03:22 data2,Send for Translation To CTM,Sep 30 2014 02:21 data1,Send for Translation To CTM,Sep 30 2014 03:23 data1,Send for Translat

我们有以下CSV文件,其中包含

DCR_Path,Direction for Translation,Date & Time

data1,Send for Translation To CTM,Sep 30 2014 03:22

data2,Send for Translation To CTM,Sep 30 2014 02:21

data1,Send for Translation To CTM,Sep 30 2014 03:23

data1,Send for Translation To CTM,Sep 30 2013 03:24

data3,Send for Translation To CTM,Sep 30 2014 03:10

data2,Send for Translation To CTM,Sep 30 2014 02:22

data1,Send for Translation To CTM,Sep 30 2014 02:20
我需要获取最新版本并删除其他副本,输出应为:

DCR_Path,Direction for Translation,Date & Time

data1,Send for Translation To CTM,Sep 30 2014 03:23

data2,Send for Translation To CTM,Sep 30 2014 02:22

data3,Send for Translation To CTM,Sep 30 2014 03:10
我尝试了下面的命令,但它不能正确地删除基于旧日期的数据,以获得大量记录

awk -F ',' '{ if (Z) { "(date --date=\""$3"\" +\"%s\")" | getline X ; if (Y[$1] < X) {     Y[$1] = X; C[$1] = $0 } } else { Z = $0 } } END { print Z ; for (V in C) { print C[V] } }' < _YOUR_FILE_
下面是我正在使用的文件的位置

https://drive.google.com/file/d/0B-v5SOZ1TWo-TEFGV05ZZFFwcXM/view?usp=sharing

由于有大量的
date
子进程,您似乎达到了打开文件描述符的某种限制。Perl似乎是一个更好的候选者,它可以在一个进程中完成所有事情

#!/usr/bin/perl -nl
if ($. == 1) { print; next }
my ($key, $action, $date) = split /,/;
my ($mo, $d, $y, $h, $m) = split / |:/, $date;
$mo = {Jan=>0,Feb=>1,Mar=>2,Apr=>3,May=>4,Jun=>5,Jul=>6,Aug=>7,Sep=>8,Oct=>9,Nov=>10,Dec=>11}->{$mo};
my $m_cmp = $m + 60*$h + 24*60*$d + 31*24*60*$mo + 12*31*24*60*$y;
$dcr{$key} = [ $action, $date, $m_cmp ] if !$dcr{$key} || $m_cmp > $dcr{$key}->[2];
END {
    print join(",", $_, @{$dcr{$_}}[0,1] ) foreach (sort keys %dcr);
}

HI SZG我需要在上面的脚本中指定文件输入和处理的文件输出?HI SZG请忽略我之前的问题我得到了如何用perl提供输入和输出,我正在使用上面的命令进行测试并共享我的输出。HI SZG上面的解决方案工作得很好,感谢您的帮助。我希望它仍然有效有点可读性。。。向上投票怎么样?:-)
#!/usr/bin/perl -nl
if ($. == 1) { print; next }
my ($key, $action, $date) = split /,/;
my ($mo, $d, $y, $h, $m) = split / |:/, $date;
$mo = {Jan=>0,Feb=>1,Mar=>2,Apr=>3,May=>4,Jun=>5,Jul=>6,Aug=>7,Sep=>8,Oct=>9,Nov=>10,Dec=>11}->{$mo};
my $m_cmp = $m + 60*$h + 24*60*$d + 31*24*60*$mo + 12*31*24*60*$y;
$dcr{$key} = [ $action, $date, $m_cmp ] if !$dcr{$key} || $m_cmp > $dcr{$key}->[2];
END {
    print join(",", $_, @{$dcr{$_}}[0,1] ) foreach (sort keys %dcr);
}