Language agnostic 在一个步骤(命令)中按具有相同标识符的列连接两个制表符分隔的文件?
我经常想连接两个ascii文件,这两个文件都是表,因为它们由由tab分隔的列组成,如下所示: 文件1Language agnostic 在一个步骤(命令)中按具有相同标识符的列连接两个制表符分隔的文件?,language-agnostic,text-processing,Language Agnostic,Text Processing,我经常想连接两个ascii文件,这两个文件都是表,因为它们由由tab分隔的列组成,如下所示: 文件1 FRUIT ID apple alpha banana beta cherry gamma 文件2 ID FOOBAR alpha cat beta dog delta airplane 我想通过一个内部连接将它们连接起来: FRUIT ID FOOBAR apple alpha cat banana beta dog 或使用左连接: FR
FRUIT ID
apple alpha
banana beta
cherry gamma
文件2
ID FOOBAR
alpha cat
beta dog
delta airplane
我想通过一个内部连接将它们连接起来:
FRUIT ID FOOBAR
apple alpha cat
banana beta dog
或使用左连接:
FRUIT ID FOOBAR
apple alpha cat
banana beta dog
cherry gamma n/a
(用于连接的标识符不一定是唯一的。)
到目前为止,我正在做的是:
有人能推荐一种更简单的方法吗?最好是在不需要排序的地方,在哪里可以按名称而不是编号指定列?类似于“joincommand ID file1 file2>result”您可以使用bash脚本自动执行任务,而无需使用临时文件,如本例所示:
#!/bin/bash
id="$1"
file1="$2"
file2="$3"
# get a filename as a parameter
# read first line of file to get $id position
get_pos() {
awk -v id="$id" '{
for (i = 1; i <= NF; i++)
if ($i == id) {
print i
exit
}
}' "$1"
}
# get $id positions from headers of the two files
pos1=$(get_pos "$file1")
pos2=$(get_pos "$file2")
# print header
printf "%s\t" "$id"
head -n1 "$file1" | sed -r "s/$id(\t|$)//" | tr -d '\n'
head -n1 "$file2" | sed -r "s/$id(\t|$)//"
# print data, add -a1 option for left join
join -t$'\t' -1 $pos1 -2 $pos2 \
<(tail -n+2 "$file1" | sort) \
<(tail -n+2 "$file2" | sort)
#/bin/bash
id=“$1”
file1=“$2”
file2=“$3”
#获取文件名作为参数
#读取文件的第一行以获取$id位置
get_pos(){
awk-v id=“$id””{
对于(i=1;i,您可以使用bash脚本自动执行任务,而无需使用临时文件,如本例所示:
#!/bin/bash
id="$1"
file1="$2"
file2="$3"
# get a filename as a parameter
# read first line of file to get $id position
get_pos() {
awk -v id="$id" '{
for (i = 1; i <= NF; i++)
if ($i == id) {
print i
exit
}
}' "$1"
}
# get $id positions from headers of the two files
pos1=$(get_pos "$file1")
pos2=$(get_pos "$file2")
# print header
printf "%s\t" "$id"
head -n1 "$file1" | sed -r "s/$id(\t|$)//" | tr -d '\n'
head -n1 "$file2" | sed -r "s/$id(\t|$)//"
# print data, add -a1 option for left join
join -t$'\t' -1 $pos1 -2 $pos2 \
<(tail -n+2 "$file1" | sort) \
<(tail -n+2 "$file2" | sort)
!/bin/bash
id=“$1”
file1=“$2”
file2=“$3”
#获取文件名作为参数
#读取文件的第一行以获取$id位置
get_pos(){
awk-v id=“$id””{
对于(i=1;i而言,完全不同的方法是使用轻量级SQL工具,如sqlite
您可以创建两个表:
$ sqlite3
SQLite version 3.7.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table fruit (fruit varchar(20), id varchar(20));
sqlite> create table foobar (id varchar(20), foobar varchar(20));
将选项卡设置为分隔符并加载文件:
sqlite> .separator "\t"
sqlite> .import file1 fruit
sqlite> .import file2 foobar
删除标题:
sqlite> delete from fruit where id = 'ID';
sqlite> delete from foobar where id = 'ID';
然后执行您需要的所有查询:
sqlite> select fruit.id, fruit, foobar from fruit, foobar where fruit.id = foobar.id;
alpha apple cat
beta banana dog
sqlite> .quit
$
在bash-here-docs的帮助下,还可以自动执行任务:
#!/bin/bash
sqlite3 <<-EOF
create table fruit (fruit varchar(20), id varchar(20));
create table foobar (id varchar(20), foobar varchar(20));
.separator "\t"
.import file1 fruit
.import file2 foobar
delete from fruit where id = 'ID';
delete from foobar where id = 'ID';
select fruit.id, fruit, foobar from fruit, foobar where fruit.id = foobar.id;
.quit
EOF
!/bin/bash
sqlite3一种完全不同的方法是使用轻量级SQL工具,如sqlite
您可以创建两个表:
$ sqlite3
SQLite version 3.7.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table fruit (fruit varchar(20), id varchar(20));
sqlite> create table foobar (id varchar(20), foobar varchar(20));
将选项卡设置为分隔符并加载文件:
sqlite> .separator "\t"
sqlite> .import file1 fruit
sqlite> .import file2 foobar
删除标题:
sqlite> delete from fruit where id = 'ID';
sqlite> delete from foobar where id = 'ID';
然后执行您需要的所有查询:
sqlite> select fruit.id, fruit, foobar from fruit, foobar where fruit.id = foobar.id;
alpha apple cat
beta banana dog
sqlite> .quit
$
在bash-here-docs的帮助下,还可以自动执行任务:
#!/bin/bash
sqlite3 <<-EOF
create table fruit (fruit varchar(20), id varchar(20));
create table foobar (id varchar(20), foobar varchar(20));
.separator "\t"
.import file1 fruit
.import file2 foobar
delete from fruit where id = 'ID';
delete from foobar where id = 'ID';
select fruit.id, fruit, foobar from fruit, foobar where fruit.id = foobar.id;
.quit
EOF
!/bin/bash
sqlite3