unix文件联接-如何保持行顺序？_Unix_Join_Text Files

unix文件联接-如何保持行顺序？

unix join

unix文件联接-如何保持行顺序？,unix,join,text-files,Unix,Join,Text Files,Unix在一个公共列上连接两个文件需要（据我所知）首先对公共列上的两个文件进行排序。如果这是正确的，那么行顺序将丢失。我想保留第一个文件的行顺序。为此，我在第一个文件中添加了一个ghost列，其中包含每行的行号。然后，我对公共列上的两个文件进行排序，连接并重新排序ghost列上的输出，然后将其删除。我包含了一个脚本来实现这一点。还有其他更好或更快的方法吗 #!/bin/bash input_file1= input_file2= output_file='/dev/stdout' key1=

Unix在一个公共列上连接两个文件需要（据我所知）首先对公共列上的两个文件进行排序。如果这是正确的，那么行顺序将丢失。我想保留第一个文件的行顺序。为此，我在第一个文件中添加了一个ghost列，其中包含每行的行号。然后，我对公共列上的两个文件进行排序，连接并重新排序ghost列上的输出，然后将其删除。我包含了一个脚本来实现这一点。还有其他更好或更快的方法吗

#!/bin/bash

input_file1=
input_file2=
output_file='/dev/stdout'
key1=
key2=
ifs_tab=0
rand=$$
key_field_type="" # flag to sort -g or -n or emtpy for numerical (-n) or general or general numeric sort (-g)
appname=`basename "$0"`

function print_help_and_exit {
    echo "Usage : $appname -1 key1 -2 key2 [-t] [-n|-g] file1 file2 [>output]"
    echo "key1: the join column from the first input file (column numbers start from 1)"
    echo "key2: the join column from the second input file"
    echo "optional flag -t uses a single tab as a field separator as opposed to a sequence of white space (which is the default)"
    echo "-n or -g : flags to be passed to sort: -n sort in numeric order, -g sort in general numeric order, default: text, leave empty"
    echo "script by Andreas Hadjiprocopis / Institute of Cancer Research, 2011"
    exit 1
}
    while getopts "1:2:o:tnh" OPTION; do
    case $OPTION in
            1)
                    key1="${OPTARG}"
                    ;;
            2)
                    key2="${OPTARG}"
                    ;;
            o)
                    output_file="${OPTARG}"
                    ;;
            t)
                    ifs_tab=1
                    ;;
            n)
                    key_field_type="-n"
                    ;;
            g)
                    key_field_type="-g"
                    ;;
            h)
                    print_help_and_exit
                    ;;
    esac
done
shift $(($OPTIND - 1))
input_file1=$1; shift
input_file2=$1; shift

if [ "$key1" == "" ] || [ "$key2" == "" ] || [ "$input_file1" == "" ] || [ "$input_file2" == "" ]; then
    echo "$appname : incorrect number of parameters" > /dev/stderr
    print_help_and_exit
fi
if [ ${ifs_tab} -eq 1 ]; then ifs1="-t$'\t'"; ifs2="-F $'\t'"; else ifs1=""; ifs2=""; fi
# note: when you do a join the output file contains the common column first, then all the columns of the first file, then all from second file

# add a new column to the beginning of the input_file1 and increment its join-column number (key1)
# then we will sort the two input files as required by join
# then we will join the two input files on the specified column numbers (key1 and key2)
# then we will sort the output according to the new column we added
# and then delete that column, output to STDOUT

let key1++
cat << EOC | sh
awk ${ifs2} '{print NR"\t"\$0}' "${input_file1}" | sort -k ${key1} ${ifs1} ${key_field_type} > /tmp/${rand}.1
sort ${ifs1} -k ${key2} ${key_field_type} "${input_file2}" > /tmp/${rand}.2
join ${ifs1} -1 ${key1} -2 ${key2} /tmp/${rand}.1 /tmp/${rand}.2 | sort ${ifs1} -k 1 -n | awk ${ifs2} '{str=\$1;for(i=3;i<=NF;i++) str=str"\t"\$i; print str}' > "${output_file}"
EOC

rm -f /tmp/${rand}.*
exit 0

#/bin/bash
输入文件1=
输入文件2=
输出文件='/dev/stdout'
关键1=
键2=
如果s_tab=0
兰德=$$
key_field_type=“”#用于数字排序（-n）或常规或常规数字排序（-g）的-g或-n或emtpy排序标志
appname=`basename“$0”`
功能打印、帮助和退出{
echo“用法：$appname-1key1-2key2[-t][-n |-g]file1 file2[>output]”
echo“key1：第一个输入文件中的联接列（列号从1开始）”
echo“key2：来自第二个输入文件的联接列”
echo“可选标志-t使用单个选项卡作为字段分隔符，而不是一系列空白（默认设置）”
echo“-n或-g：要传递到排序的标志：-n按数字顺序排序，-g按一般数字顺序排序，默认值：文本，保留为空”
echo“Andreas Hadjiprocopis/癌症研究所编写的脚本，2011年”
出口1
}
而getopts“1:2:o:tnh”选项；做
案例中的$OPTION
1)
key1=“${OPTARG}”
;;
2)
key2=“${OPTARG}”
;;
o）
output_file=“${OPTARG}”
;;
（t）
如果s_tab=1
;;
n）
键\字段\类型=“-n”
;;
（g）
键\字段\类型=“-g”
;;
h）
打印“帮助”和“退出”
;;
以撒
完成
班次$（$OPTIND-1））
输入文件1=$1；转移
输入文件2=$1；转移
如果[“$key1”==”]|【“$key2”==”]|【“$input_file1”==”]|【“$input_file2”==”]；然后
echo“$appname:参数数量不正确”>/dev/stderr
打印“帮助”和“退出”
fi
如果[${ifs_tab}-eq 1]；然后ifs1=“-t$”\t'；ifs2=“-F$”\t'；else ifs1=“”；ifs2=“”；fi
#注意：执行联接时，输出文件首先包含公共列，然后是第一个文件的所有列，然后是第二个文件的所有列
#在输入文件1的开头添加一个新列，并增加其联接列编号（键1）
#然后，我们将根据join的要求对两个输入文件进行排序
#然后我们将在指定的列号（key1和key2）上连接两个输入文件
#然后，我们将根据添加的新列对输出进行排序
#然后删除该列，输出到STDOUT
让键1++
cat/tmp/${rand}.1
排序${ifs1}-k${key2}${key\u field\u type}“${input\u file2}”>/tmp/${rand}.2
加入${ifs1}-1${key1}-2${key2}/tmp/${rand}.1/tmp/${rand}.2 | sort${ifs1}-k1-n | awk${ifs2}{str=\$1；for（i=3；i这里有几点建议：

您不需要创建临时文件。请改用
你不需要猫谢谢你的提示。猫
join ${ifs1} -1 ${key1} -2 ${key2} \
   <(awk ${ifs2} '{print NR"\t"$0}' "${input_file1}" | sort -k ${key1} ${ifs1} ${key_field_type}) \
   <(sort ${ifs1} -k ${key2} ${key_field_type} "${input_file2}") \
| sort ${ifs1} -k 1 -n \
| awk ${ifs2} '{str=$1;for(i=3;i<=NF;i++) str=str"\t"$i; print str}' > "${output_file}"