Awk 匹配多个输入文件中的条目_Awk

Awk 匹配多个输入文件中的条目

awk

Awk 匹配多个输入文件中的条目,awk,Awk,我的FileA内容是： LetterA LetterM 12 LetterB LetterC 45 LetterB LetterG 23 LetterA 23 43 LetterZ LetterB 21 71 LetterC FileB内容是： LetterA LetterM 12 LetterB LetterC 45 LetterB LetterG 23 LetterA 2

我的

FileA

内容是：

 LetterA     LetterM  12
 LetterB     LetterC  45
 LetterB     LetterG  23

 LetterA    23   43    LetterZ
 LetterB    21   71    LetterC

FileB

内容是：

 LetterA     LetterM  12
 LetterB     LetterC  45
 LetterB     LetterG  23

 LetterA    23   43    LetterZ
 LetterB    21   71    LetterC

如果

FileA$1=FileB$1&&FileA$2=FileB$4

对于这样的输出：

 LetterB     LetterC  45   -50

我可以使用bash循环来完成它

 while read ENTRY
 do
    COLUMN1=$(cut -f 1 $ENTRY)
    COLUMN2=$(cut -f 2 $ENTRY)
    awk -v COLUMN1="$COLUMN1" -v COLUMN2="COLUMN2" -v ENTRY="$ENTRY"   
         '($1==COLUMN1 && $4==COLUMN2) 
         {print ENTRY,$2-$3}' FileB
 done < FileA

读取条目时
做
第1列=$（剪切-f 1$条目）
第2列=$（剪切-f 2$条目）
awk-v COLUMN1=“$COLUMN1”-v COLUMN2=“COLUMN2”-v ENTRY=“$ENTRY”
“（$1==COLUMN1&&$4==COLUMN2）
{打印条目，$2-$3}'文件B
完成

然而，这个循环太慢了。有没有一种方法可以在不循环的情况下使用awk实现这一点？

获取多个输入文件->匹配其内容->打印想要的输出

可以在awk一行中解决：

awk 'NR==FNR{a[$1":"$2]=$0; next}
     NR>FNR && $1":"$4 in a{print a[$1":"$4], $2-$3}' fileA fileB

或者更简洁（感谢@JS）웃):

可在awk一个班轮中解决：

awk 'NR==FNR{a[$1":"$2]=$0; next}
     NR>FNR && $1":"$4 in a{print a[$1":"$4], $2-$3}' fileA fileB

或者更简洁（感谢@JS）웃):

我决定尝试使用Python和Numpy，以获得一个稍微不太传统但希望很快的解决方案：

import numpy as np

# load the files into arrays with automatically determined types per column
a = np.genfromtxt("fileA", dtype=None)
b = np.genfromtxt("fileB", dtype=None)

# concatenate the string columns (n.b. assumes no "foo" "bar" and "fo" "obar")
aText = np.core.defchararray.add(a['f0'], a['f1'])
bText = np.core.defchararray.add(b['f0'], b['f3'])

# find the locations where the strings from A match in B, and print the values
for index in np.where(np.in1d(aText, bText)):
    aRow = a[index][0]
    bRow = b[bText == aText[index]][0]
    print '{1} {2} {3} {0}'.format(bRow[1] - bRow[2], *aRow)

编辑：一旦开始，它的速度很快，但不幸的是，加载文件所花费的时间比@anubhava使用awk的优秀解决方案要长。

我决定尝试使用Python和Numpy，以获得一个稍微非正统但希望快速的解决方案：

import numpy as np

# load the files into arrays with automatically determined types per column
a = np.genfromtxt("fileA", dtype=None)
b = np.genfromtxt("fileB", dtype=None)

# concatenate the string columns (n.b. assumes no "foo" "bar" and "fo" "obar")
aText = np.core.defchararray.add(a['f0'], a['f1'])
bText = np.core.defchararray.add(b['f0'], b['f3'])

# find the locations where the strings from A match in B, and print the values
for index in np.where(np.in1d(aText, bText)):
    aRow = a[index][0]
    bRow = b[bText == aText[index]][0]
    print '{1} {2} {3} {0}'.format(bRow[1] - bRow[2], *aRow)

编辑：一旦开始运行，它会很快，但不幸的是，加载文件所花费的时间比@anubhava使用awk的优秀解决方案要长。

谢谢，它工作起来很有魅力！只想问：为什么

$1”：在一个

中4美元有效，但当我只在一个中尝试1美元时，它不起作用？我不知道这是否是一个单行程序，但它是一个很好的演示，演示了awk是如何比通常理解的功能强大得多。@JohnZwinck你的单行程序

awk'NR==FNR{a[$1$2]=0；下一个}$1$4在一个{print a[$4]，$2-$3}文件{a，B

：@JS웃: 一如既往，您的1号班轮甚至更小：P@JS웃: 我已将您的解决方案添加到我的答案中作为备选方案。非常感谢。谢谢，它非常有效！只想问：为什么

$1”：在一个

awk'NR==FNR{a[$1$2]=0；下一个}$1$4在一个{print a[$4]，$2-$3}文件{a，B