Linux AWK如何修改此代码,使其打印num实例,而不是0和1

Linux AWK如何修改此代码,使其打印num实例,而不是0和1,linux,awk,Linux,Awk,我有一个bash脚本,其中包含一些AWK,用于解决我试图解决的问题 <targets.txt xargs -n1 -P4 bash -c " awk 'NR==FNR{a[\$0];next} { if (\$0 in a) { printf \"1,\" } else { printf \"0,\" } }' \"\$1\" values.txt | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut

我有一个bash脚本,其中包含一些AWK,用于解决我试图解决的问题

<targets.txt xargs -n1 -P4 bash -c "
awk 'NR==FNR{a[\$0];next} 
{
  if (\$0 in a) 
  {
    printf \"1,\"
  } 
  else 
  {
    printf \"0,\"
  }
}' \"\$1\" values.txt | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01'
示例./dataset/tallperson/file1.txt

LOL
Lol
Hel
lo.
示例./dataset/tallperson/file2.txt

LOL
LOL
Wei
rd.
示例./dataset/tallperson/file3.txt

Lol
Lol
示例./dataset/shortperson/file4.txt

hah
a t
hat
was
fun
ny.
LOL
LOL
示例values.txt

LOL
Lol
Hel
lo.
Wei
rd.
hah
a t
hat
was
fun
ny.
期望输出

1,1,1,1,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,1,1,0,0,0,0,0,0,tallperson
0,2,0,0,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,0,0,1,1,1,1,1,1,shortperson
我的脚本中不需要的输出

1,1,1,1,0,0,0,0,0,0,0,0,tallperson
1,0,0,0,1,1,0,0,0,0,0,0,tallperson
0,1,0,0,0,0,0,0,0,0,0,0,tallperson
1,0,0,0,0,0,1,1,1,1,1,1,shortperson

我有values.txt,其中包含targets.txt中每个文件的唯一3个字符的值列表。没有file.txt包含targets.txt中没有的值。我只想查看targets.txt中的每个文件,并计算该文件包含values.txt中的每个值的数量

您不需要awk以外的任何东西来完成此操作,例如使用GNU awk for gensub、ARGIND和ENDFILE:

$ cat tst.awk
BEGIN { OFS="," }
ARGIND == 1 {
    ARGV[ARGC] = $0
    ARGC++
    next
}
ARGIND == 2 {
    strings[++numStrings] = $0
    next
}
{ cnt[$0]++ }
ENDFILE {
    if ( ARGIND > 2 ) {
        for (stringNr=1; stringNr<=numStrings; stringNr++) {
            string = strings[stringNr]
            printf "%d%s", cnt[string], OFS
        }
        print gensub(/(.*\/)?([^/]+)\/[^/]+$/,"\\2",1,FILENAME)
        delete cnt
    }
}
$ awk -f tst.awk targets.txt values.txt
1,1,1,1,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,1,1,0,0,0,0,0,0,tallperson
0,2,0,0,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,0,0,1,1,1,1,1,1,shortperson
当然,您实际上并不需要values.txt文件,除非您确实需要无法通过输入确定的输出字段的特定顺序:

$ cat tst.awk
BEGIN { OFS="," }
ARGIND == 1 {
    ARGV[ARGC] = $0
    ARGC++
    next
}
{
    if ( !seen[$0]++ ) {
        strings[++numStrings] = $0
    }
    cnt[ARGIND,$0]++
}
END {
    for (stringNr=1; stringNr<=numStrings; stringNr++) {
        string = strings[stringNr]
        printf "%s%s", string, OFS
    }
    print "directory"

    for (fileNr=2; fileNr<=ARGIND; fileNr++) {
        for (stringNr=1; stringNr<=numStrings; stringNr++) {
            string = strings[stringNr]
            printf "%d%s", cnt[fileNr,string], OFS
        }
        print gensub(/(.*\/)?([^/]+)\/[^/]+$/,"\\2",1,ARGV[fileNr])
    }
}

$ awk -f tst.awk targets.txt
LOL,Lol,Hel,lo.,Wei,rd.,hah,a t,hat,was,fun,ny.,directory
1,1,1,1,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,1,1,0,0,0,0,0,0,tallperson
0,2,0,0,0,0,0,0,0,0,0,0,tallperson
2,0,0,0,0,0,1,1,1,1,1,1,shortperson
我在第二个脚本中添加了一个标题-如果你不想要它,就不要添加它

如果您真的不关心输出顺序,那么您只需要:

$ cat tst.awk
BEGIN { OFS="," }
ARGIND == 1 {
    ARGV[ARGC] = $0
    ARGC++
    next
}
{
    strings[$0]
    cnt[ARGIND,$0]++
}
END {
    for (string in strings) {
        printf "%s%s", string, OFS
    }
    print "directory"

    for (fileNr=2; fileNr<=ARGIND; fileNr++) {
        for (string in strings) {
            printf "%d%s", cnt[fileNr,string], OFS
        }
        print gensub(/(.*\/)?([^/]+)\/[^/]+$/,"\\2",1,ARGV[fileNr])
    }
}

$ awk -f tst.awk targets.txt
was,rd.,Lol,ny.,LOL,Wei,hat,hah,lo.,fun,a t,Hel,directory
0,0,1,0,1,0,0,0,1,0,0,1,tallperson
0,1,0,0,2,1,0,0,0,0,0,0,tallperson
0,0,2,0,0,0,0,0,0,0,0,0,tallperson
1,0,0,1,2,0,1,1,0,1,1,0,shortperson

将targets.txt和所需输出添加到您的问题中,没有对该示例输入的说明,没有注释。我在打印gensub/*\/?[^/]+\/[^/]+/[^/]+$/,\\2,1,ARGV[fileNr]awk:tst.awk:line 20:正则表达式编译失败错误类-[],[^]或[*/[^awk:tst.awk:line 20:语法错误在或附近]awk:tst.awk:第20行:正则表达式编译失败错误类-[]、[^]或[[^awk:tst.awk:line 20:runaway regular expression/,\\2,1,F…awk-version告诉您正在运行的gawk版本是什么?我怀疑您实际上没有运行gawk,可能是mawk?但无论如何,您可以尝试在括号表达式中的每个/s之前添加\以查看这是否有帮助。或者直接获取目录名即可请以其他方式告诉我,例如n=splitFILENAME,p,/;print p[n-1]。是的。我正在运行Mawk。而不是gawk。好的,然后按照我在回答中提到的那样运行gawk。您在linux上标记了您的问题,而AFAIK gawk是linux上的默认awk。
$ cat tst.awk
BEGIN { OFS="," }
ARGIND == 1 {
    ARGV[ARGC] = $0
    ARGC++
    next
}
{
    strings[$0]
    cnt[ARGIND,$0]++
}
END {
    for (string in strings) {
        printf "%s%s", string, OFS
    }
    print "directory"

    for (fileNr=2; fileNr<=ARGIND; fileNr++) {
        for (string in strings) {
            printf "%d%s", cnt[fileNr,string], OFS
        }
        print gensub(/(.*\/)?([^/]+)\/[^/]+$/,"\\2",1,ARGV[fileNr])
    }
}

$ awk -f tst.awk targets.txt
was,rd.,Lol,ny.,LOL,Wei,hat,hah,lo.,fun,a t,Hel,directory
0,0,1,0,1,0,0,0,1,0,0,1,tallperson
0,1,0,0,2,1,0,0,0,0,0,0,tallperson
0,0,2,0,0,0,0,0,0,0,0,0,tallperson
1,0,0,1,2,0,1,1,0,1,1,0,shortperson