Sed 比较awk与读取行时的行_Sed_Awk_Ksh_Cat

Sed 比较awk与读取行时的行

sed awk

Sed 比较awk与读取行时的行,sed,awk,ksh,cat,Sed,Awk,Ksh,Cat,我有两个文件，一个17k行，另一个4k行。我想用第二个文件中的每一行比较位置115和位置125，如果有匹配项，将第一个文件中的整行写入一个新文件。我想出了一个解决方案，在读取行时使用“cat$filename”读取文件。但这需要大约8分钟才能完成。有没有其他方法像使用“awk”来减少此过程时间我的代码 cat $filename | while read LINE do #read 115 to 125 and then remove trailing spaces and leading

我有两个文件，一个17k行，另一个4k行。我想用第二个文件中的每一行比较位置115和位置125，如果有匹配项，将第一个文件中的整行写入一个新文件。我想出了一个解决方案，在读取行时使用“cat$filename”读取文件。但这需要大约8分钟才能完成。有没有其他方法像使用“awk”来减少此过程时间

我的代码

cat $filename | while read LINE
do
  #read 115 to 125 and then remove trailing spaces and leading zeroes
  vid=`echo "$LINE" | cut -c 115-125 | sed 's,^ *,,; s, *$,,' | sed 's/^[0]*//'`
  exist=0
  #match vid with entire line in id.txt
  exist=`grep -x "$vid" $file_dir/id.txt | wc -l`
  if [[ $exist -gt 0 ]]; then
    echo "$LINE" >> $dest_dir/id.txt
  fi
done

这是怎么回事：

FNR==NR {                      # FNR == NR is only true in the first file

    s = substr($0,115,10)      # Store the section of the line interested in 
    sub(/^\s*/,"",s)           # Remove any leading whitespace
    sub(/\s*$/,"",s)           # Remove any trailing whitespace

    lines[s]=$0                # Create array of lines
    next                       # Get next line in first file
}
{                              # Now in second file
    for(i in lines)            # For each line in the array
        if (i~$0) {            # If matches the current line in second file 
            print lines[i]     # Print the matching line from file1
            next               # Get next line in second file
        }
}

将其保存到script

script.awk

并像以下那样运行：

$ awk -f script.awk "$filename" "${file_dir}/id.txt" > "${dest_dir}/id.txt"

这仍然会很慢，因为对于第二个文件中的每一行，您需要查看第一个文件中约50%的唯一行（假设大多数行确实匹配）。如果可以确认第二个文件中的行与子字符串完全匹配，则可以显著改进这一点

对于全行匹配，这应该更快：

FNR==NR {                      # FNR == NR is only true in the first file

    s = substr($0,115,10)      # Store the section of the line interested in 
    sub(/^\s*/,"",s)           # Remove any leading whitespace
    sub(/\s*$/,"",s)           # Remove any trailing whitespace

    lines[s]=$0                # Create array of lines
    next                       # Get next line in first file
}
($0 in lines) {                  # Now in second file
    print lines[$0]     # Print the matching line from file1
}

使用awk，您可以使用

NR

作为行号。这样您可能会节省时间。我可以确认第二个文件中的行与第一个文件中的子字符串完全匹配。@user37774我添加了一个脚本，该脚本对于完全匹配应该更快。