Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Bash 如何在不排序的情况下删除两个文件之间的公用行?_Bash_Sorting_Optimization_Sed_Comm - Fatal编程技术网

Bash 如何在不排序的情况下删除两个文件之间的公用行?

Bash 如何在不排序的情况下删除两个文件之间的公用行?,bash,sorting,optimization,sed,comm,Bash,Sorting,Optimization,Sed,Comm,我有两个文件没有分类,它们有一些共同的行 file1.txt Z B A H L file2.txt S L W Q A 我用于删除公用线的方法如下: sort -u file1.txt > file1_sorted.txt sort -u file2.txt > file2_sorted.txt comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt 输出: B H Z 问题是我想保持file1.txt

我有两个文件没有分类,它们有一些共同的行

file1.txt

Z
B
A
H
L
file2.txt

S
L
W
Q
A
我用于删除公用线的方法如下:

sort -u file1.txt > file1_sorted.txt
sort -u file2.txt > file2_sorted.txt

comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt
输出:

B
H
Z
问题是我想保持file1.txt的顺序,我的意思是:

期望输出:

Z
B
H
我想到的一个解决方案是循环读取file2.txt的所有行,然后:

sed -i '/^${line_file2}$/d' file1.txt
但是如果文件很大,性能可能会很差

  • 你喜欢我的主意吗
  • 你有别的选择吗
      您可以只使用grep(
      -v
      表示反转,
      -f
      表示文件)。来自
      input1
      的Grep行与
      input2
      中的任何行都不匹配:

      grep -vf input2 input1 
      
      给出:

      grep或awk:

      awk 'NR==FNR{a[$0]=1;next}!a[$0]' file2 file1
      

      我已经编写了一个用于这类事情的Perl脚本。它不仅可以满足您的要求,还可以满足您的需求:

      #!/usr/bin/env perl -w
      use strict;
      use Getopt::Std;
      my %opts;
      getopts('hvfcmdk:', \%opts);
      my $missing=$opts{m}||undef;
      my $column=$opts{k}||undef;
      my $common=$opts{c}||undef;
      my $verbose=$opts{v}||undef;
      my $fast=$opts{f}||undef;
      my $dupes=$opts{d}||undef;
      $missing=1 unless $common || $dupes;;
      &usage() unless $ARGV[1];
      &usage() if $opts{h};
      my (%found,%k,%fields);
      if ($column) {
          die("The -k option only works in fast (-f) mode\n") unless $fast;
          $column--; ## So I don't need to count from 0
      }
      
      open(my $F1,"$ARGV[0]")||die("Cannot open $ARGV[0]: $!\n");
      while(<$F1>){
          chomp;
          if ($fast){ 
          my @aa=split(/\s+/,$_);
          $k{$aa[0]}++;   
              $found{$aa[0]}++;
          }
          else {
          $k{$_}++;   
              $found{$_}++;
          }
      }
      close($F1);
      my $n=0;
      open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
      my $size=0;
      if($verbose){
          while(<F2>){
              $size++;
          }
      }
      close(F2);
      open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
      
      while(<F2>){
          next if /^\s+$/;
          $n++;
          chomp;
          print STDERR "." if $verbose && $n % 10==0;
          print STDERR "[$n of $size lines]\n" if $verbose && $n % 800==0;
          if($fast){
              my @aa=split(/\s+/,$_);
              $k{$aa[0]}++ if defined($k{$aa[0]});
              $fields{$aa[0]}=\@aa if $column;
          }
          else{
              my @keys=keys(%k);
              foreach my $key(keys(%found)){
                  if (/\Q$key/){
                  $k{$key}++ ;
                  $found{$key}=undef unless $dupes;
                  }
              }
          }
      }
      close(F2);
      print STDERR "[$n of $size lines]\n" if $verbose;
      
      if ($column) {
          $missing && do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" unless $k{$_}>1}keys(%k);
          $common &&  do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>1}keys(%k);
          $dupes &&   do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>2}keys(%k);
      }
      else {
          $missing && do map{print "$_\n" unless $k{$_}>1}keys(%k);
          $common &&  do map{print "$_\n" if $k{$_}>1}keys(%k);
          $dupes &&   do map{print "$_\n" if $k{$_}>2}keys(%k);
      }
      sub usage{
          print STDERR <<EndOfHelp;
      
        USAGE: compare_lists.pl FILE1 FILE2
      
            This script will compare FILE1 and FILE2, searching for the 
            contents of FILE1 in FILE2 (and NOT vice versa). FILE one must 
            be one search pattern per line, the search pattern need only be 
            contained within one of the lines of FILE2.
      
          OPTIONS: 
            -c : Print patterns COMMON to both files
            -f : Search only the first characters of each line of FILE2
            for the search pattern given in FILE1
            -d : Print duplicate entries     
            -m : Print patterns MISSING in FILE2 (default)
            -h : Print this help and exit
      EndOfHelp
            exit(0);
      }
      

      -f
      选项使它只比较file2的第一个单词(由空格定义),大大加快了速度。要比较整行,请删除
      -f

      @JohnDoe我们只需要一个非零的数字,7和1没有区别。我把它改成1,如果它让你感觉舒服的话。:-)是的,我现在感觉好多了实际上这是最好的方法。比grep方法更快。谢谢D@JohnDoe它应该比grep更快,因为awk中的数组是哈希表,检查键使用哈希函数,这将是
      O(1)
      ,grep需要
      O(n^2)
      ,但是,我的awk行将把file2保存在内存中。所以空间复杂度比grep线大,这是一个救星。啊,石头!如果使用选项
      -F-w或-x
      会更好吗?e、 子字符串大小写。这用于比较整行是否相等:
      grep-vxf input2input1
      。另外,macOS上的grep(grep版本2.5.1)很奇怪,没有给出任何结果,所以我不得不使用自制的grep,即gnu grep。
      #!/usr/bin/env perl -w
      use strict;
      use Getopt::Std;
      my %opts;
      getopts('hvfcmdk:', \%opts);
      my $missing=$opts{m}||undef;
      my $column=$opts{k}||undef;
      my $common=$opts{c}||undef;
      my $verbose=$opts{v}||undef;
      my $fast=$opts{f}||undef;
      my $dupes=$opts{d}||undef;
      $missing=1 unless $common || $dupes;;
      &usage() unless $ARGV[1];
      &usage() if $opts{h};
      my (%found,%k,%fields);
      if ($column) {
          die("The -k option only works in fast (-f) mode\n") unless $fast;
          $column--; ## So I don't need to count from 0
      }
      
      open(my $F1,"$ARGV[0]")||die("Cannot open $ARGV[0]: $!\n");
      while(<$F1>){
          chomp;
          if ($fast){ 
          my @aa=split(/\s+/,$_);
          $k{$aa[0]}++;   
              $found{$aa[0]}++;
          }
          else {
          $k{$_}++;   
              $found{$_}++;
          }
      }
      close($F1);
      my $n=0;
      open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
      my $size=0;
      if($verbose){
          while(<F2>){
              $size++;
          }
      }
      close(F2);
      open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
      
      while(<F2>){
          next if /^\s+$/;
          $n++;
          chomp;
          print STDERR "." if $verbose && $n % 10==0;
          print STDERR "[$n of $size lines]\n" if $verbose && $n % 800==0;
          if($fast){
              my @aa=split(/\s+/,$_);
              $k{$aa[0]}++ if defined($k{$aa[0]});
              $fields{$aa[0]}=\@aa if $column;
          }
          else{
              my @keys=keys(%k);
              foreach my $key(keys(%found)){
                  if (/\Q$key/){
                  $k{$key}++ ;
                  $found{$key}=undef unless $dupes;
                  }
              }
          }
      }
      close(F2);
      print STDERR "[$n of $size lines]\n" if $verbose;
      
      if ($column) {
          $missing && do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" unless $k{$_}>1}keys(%k);
          $common &&  do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>1}keys(%k);
          $dupes &&   do map{my @aa=@{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>2}keys(%k);
      }
      else {
          $missing && do map{print "$_\n" unless $k{$_}>1}keys(%k);
          $common &&  do map{print "$_\n" if $k{$_}>1}keys(%k);
          $dupes &&   do map{print "$_\n" if $k{$_}>2}keys(%k);
      }
      sub usage{
          print STDERR <<EndOfHelp;
      
        USAGE: compare_lists.pl FILE1 FILE2
      
            This script will compare FILE1 and FILE2, searching for the 
            contents of FILE1 in FILE2 (and NOT vice versa). FILE one must 
            be one search pattern per line, the search pattern need only be 
            contained within one of the lines of FILE2.
      
          OPTIONS: 
            -c : Print patterns COMMON to both files
            -f : Search only the first characters of each line of FILE2
            for the search pattern given in FILE1
            -d : Print duplicate entries     
            -m : Print patterns MISSING in FILE2 (default)
            -h : Print this help and exit
      EndOfHelp
            exit(0);
      }
      
      list_compare.pl -cf file1.txt file2.txt