在awk中，如何将包含多个格式字符串的文件与printf一起使用？_Awk_Printf

在awk中，如何将包含多个格式字符串的文件与printf一起使用？

awk

在awk中，如何将包含多个格式字符串的文件与printf一起使用？,awk,printf,Awk,Printf,我有一个例子，我想使用来自文件的输入作为awk中printf（）的格式。当我在代码中的字符串中设置格式时，格式可以工作，但当我从输入中加载它时，格式不工作下面是一个小例子： $ # putting the format in a variable works just fine: $ echo "" | awk -vs="hello:\t%s\n\tfoo" '{printf(s "bar\n", "world");}' hello: world foobar $ # But

我有一个例子，我想使用来自文件的输入作为awk中

printf（）

的格式。当我在代码中的字符串中设置格式时，格式可以工作，但当我从输入中加载它时，格式不工作

下面是一个小例子：

$ # putting the format in a variable works just fine:
$ echo "" | awk -vs="hello:\t%s\n\tfoo" '{printf(s "bar\n", "world");}'
hello:  world
        foobar
$ # But getting the format from an input file does not.
$ echo "hello:\t%s\n\tfoo" | awk '{s=$0; printf(s "bar\n", "world");}'
hello:\tworld\n\tfoobar
$

$ echo "" | awk '{s="a\t%s"; printf s"\n","b"}'
a       b

$ echo "a\t%s" | awk '{s=$0; printf s"\n","b"}'
a\tb

所以。。。格式替换可以使用（“

%s

”），但不能使用特殊字符，如制表符和换行符。知道为什么会这样吗？有没有一种方法可以“做点什么”来输入数据，使其可用作格式字符串

更新#1:

作为另一个例子，考虑以下使用BASH HeSETXECT:< /P>

[me@here ~]$ awk -vs="hello: %s\nworld: %s\n" '{printf(s, "foo", "bar");}' <<<""
hello: foo
world: bar
[me@here ~]$ awk '{s=$0; printf(s, "foo", "bar");}' <<<"hello: %s\nworld: %s\n"
hello: foo\nworld: bar\n[me@here ~]$

示例输入为：

## fmtstrings:
1 ID:%04d Name:%s\nAddress: %s\n\n
2 CustomerID:\t%-4d\t\tName: %s\n\t\t\t\tAddress: %s\n
3 Customer: %d / %s (%s)\n

## sampledata:
5 Companyname 123 Somewhere Street
12 Othercompany 234 Elsewhere

我的希望是，我能够构造类似这样的东西，只需对awk进行一次调用即可完成整个任务，而不是在shell中使用嵌套循环：

awk '

  NR==FNR { fmts[$1]=$2; next; }

  {
    for(fmtid in fmts) {
      outputfile=sprintf("/path/%d/%d", fmtid, custid);
      printf(fmts[fmtid], $1, $2) > outputfile;
    }
  }

' /path/to/fmtstrings /path/to/sampledata

显然，这是行不通的，因为这个问题的实际主题，也因为我还没有弄清楚如何优雅地让awk将$2..$n加入到一个变量中。（但这是未来可能出现的问题的主题。）

FWIW，我正在使用FreeBSD 9.2及其内置版本，但如果能找到解决方案，我愿意使用gawk。

问题在于

\t

和

\n

的特殊字符

和\n
不能被echo
理解为字符串，而不是表格和换行。此行为可以由您给echo的-e
标志控制，而无需更改您的awk脚本：
echo -e "hello:\t%s\n\tfoo" | awk '{s=$0; printf(s "bar\n", "world");}'

多田！！：）
编辑：
好的，在Chrono正确地提出这一点之后，我们可以设计出另一个答案，对应于从文件中读取模式的原始请求：
echo "hello:\t%s\n\tfoo" > myfile
awk 'BEGIN {s="'$(cat myfile)'" ; printf(s "bar\n", "world")}'

当然，在上面的引用中，我们必须小心，因为awk看不到$（cat myfile）
，而是由shell解释。
为什么一个例子如此冗长复杂？这说明了问题：
$ # putting the format in a variable works just fine:
$ echo "" | awk -vs="hello:\t%s\n\tfoo" '{printf(s "bar\n", "world");}'
hello:  world
        foobar
$ # But getting the format from an input file does not.
$ echo "hello:\t%s\n\tfoo" | awk '{s=$0; printf(s "bar\n", "world");}'
hello:\tworld\n\tfoobar
$ 

$ echo "" | awk '{s="a\t%s"; printf s"\n","b"}'
a       b

$ echo "a\t%s" | awk '{s=$0; printf s"\n","b"}'
a\tb

在第一种情况下，字符串“a\t%s”是字符串文字，因此会被解释两次-一次是在awk读取脚本时，另一次是在执行脚本时，因此在第一次传递时展开\t
，然后在执行时，awk在格式化字符串中有一个文字制表符
在第二种情况下，awk在格式化字符串中仍然有反斜杠和t字符-因此不同的行为
您需要一些东西来解释这些转义字符，其中一种方法是调用shell的printf并读取结果（根据@EtanReiser的出色观察更正，我使用了双引号，在这里我应该使用单引号，由\047实现，以避免shell扩展）：
如果不需要变量中的结果，可以调用system（）

如果您只是希望扩展转义字符，这样就不需要在shellprintf
调用中提供%s
参数，那么您只需要转义所有%
s（注意已经转义的%
s）
如果愿意，可以调用awk而不是shellprintf

请注意，这种方法虽然笨拙，但比调用eval
要安全得多，后者可能只执行输入行，如rm-rf/*.*.

在阿诺德·罗宾斯（gawk的创始人）和曼努埃尔·科拉多（另一位著名的awk专家）的帮助下，下面是一个脚本，它将扩展单字符转义序列：
$ cat tst2.awk
function expandEscapes(old,     segs, segNr, escs, idx, new) {
    split(old,segs,/\\./,escs)
    for (segNr=1; segNr in segs; segNr++) {
        if ( idx = index( "abfnrtv", substr(escs[segNr],2,1) ) )
            escs[segNr] = substr("\a\b\f\n\r\t\v", idx, 1)
        new = new segs[segNr] escs[segNr]
    }
    return new
}

{
    s = expandEscapes($0)
    printf s, "foo", "bar"
}


如果愿意，可以将split（）RE改为
对于\\
后面的十六进制值：
c = sprintf("%c", strtonum("0x" rest_of_str))

对于八进制值：
c = sprintf("%c", strtonum("0" rest_of_str))

这是一个很酷的问题，我不知道awk中的答案，但在perl中可以使用eval
：
echo '%10s\t:\t%-10s\n' |  perl -ne ' chomp; eval "printf (\"$_\", \"hi\", \"hello\")"'
        hi  :   hello  

注意：在任何语言中使用eval
时都要注意代码注入的危险性，不只是eval任何系统调用都不能盲目进行
Awk中的示例：
echo '$(whoami)' | awk '{"printf \"" $0 "\" " "b" | getline s; print s}'
tiago

如果输入是$（rm-rf/）
，该怎么办？你可以猜到会发生什么：）

池上补充说：
为什么还要考虑使用eval
将\n
转换为换行符，将\t
转换为选项卡
echo '%10s\t:\t%-10s\n' | perl -e'
   my %repl = (
      n => "\n",
      t => "\t",
   );

   while (<>) {
      chomp;
      s{\\(?:(\w)|(\W))}{
         if (defined($2)) {
            $2
         }
         elsif (exists($repl{$1})) {
            $repl{$1}
         }
         else {
            warn("Unrecognized escape \\$1.\n");
            $1
         }
      }eg;

      printf($_, "hi", "hello");
   }
'

Ed Morton清楚地说明了问题（编辑：）：awk的字符串文字处理处理处理了转义，文件I/O代码不是词法分析器
这是一个简单的解决办法：决定你想要支持什么逃逸，并支持它们。如果你在做特殊用途的工作，不需要处理逃逸的反斜杠，这里有一个单行表单
awk '{ gsub(/\\n/,"\n"); gsub(/\\t/,"\t"); printf($0 "bar\n", "world"); }' <<\EOD
hello:\t%s\n\tfoo
EOD

awk'{gsub（/\\n/，“\n”）；gsub（/\\t/，“\t”）；printf（$0“bar\n”，“world”）；}'这看起来非常难看，但它可以解决这个特殊问题：
s=$0;
gsub(/'/, "'\\''", s);
gsub(/\\n/, "\\\\\\\\n", s);
"printf '%b' '" s "'" | getline s;
gsub(/\\\\n/, "\n", s);
gsub(/\\n/, "\n", s);
printf(s " bar\n", "world");

将所有单引号替换为shell转义单引号（'\''
）
将所有正常显示为\n
的转义换行符序列替换为显示为\\\n
的序列。使用\\\\n
作为实际替换字符串就足够了（这意味着如果您打印它，\\n
将被打印），但是在POSIX模式下，gawk I have的版本会把事情搞砸
调用shell以执行printf“%b”escape'\''d format'
，并使用awk的getline语句检索该行
Unescape\\n
生成换行符。如果gawk在POSIX模式下运行良好，则无需执行此步骤
Unescape\n
生成换行符
否则，您就要为每个可能的转义序列调用gsub函数，这对于\001
、\002
等来说是非常糟糕的。
很好地解释了这个问题
一个简单的解决方法是：

使用命令替换，通过awk
变量传递格式字符串文件内容

假设文件不太大，无法完全读入内存

使用GNUawk
或mawk
：
awk -v formats="$(tr '\n' '\3' <fmtStrings)" '
     # Initialize: Split the formats into array elements.
    BEGIN {n=split(formats, aFormats, "\3")}
     # For each data line, loop over all formats and print.
    { for(i=1;i<n;++i) {printf aFormats[i] "\n", $1, $2, $3} }
    ' sampleData

awk-v formats=“$（tr'\n'\3'
echo '%10s\t:\t%-10s\n' | perl -nle'
   s/\\(?:(n)|(t)|(.))/$1?"\n":$2?"\t":$3/seg;
   printf($_, "hi", "hello");
'

awk '{ gsub(/\\n/,"\n"); gsub(/\\t/,"\t"); printf($0 "bar\n", "world"); }' <<\EOD
hello:\t%s\n\tfoo
EOD

s=$0;
gsub(/'/, "'\\''", s);
gsub(/\\n/, "\\\\\\\\n", s);
"printf '%b' '" s "'" | getline s;
gsub(/\\\\n/, "\n", s);
gsub(/\\n/, "\n", s);
printf(s " bar\n", "world");

awk -v formats="$(tr '\n' '\3' <fmtStrings)" '
     # Initialize: Split the formats into array elements.
    BEGIN {n=split(formats, aFormats, "\3")}
     # For each data line, loop over all formats and print.
    { for(i=1;i<n;++i) {printf aFormats[i] "\n", $1, $2, $3} }
    ' sampleData

 echo '%10s\t:\t%10s\r\n' | perl -lne 's/((?:\\[a-zA-Z\\])+)/qq[qq[$1]]/eeg; printf "$_","hi","hello"'  
        hi  :        hello

echo '%10s\t:\t%10s\r\n' | perl -lne 's/((?:\\[a-zA-Z\\])+)/qq[qq[$1]]/eeg; printf "$_","hi","hello"'   | cat -A
        hi^I:^I     hello^M$

awk 'BEGIN  {if(ARGC!=3)exit(1);
             fn=ARGV[2];ARGC=2}
     NR==FNR{ARGV[ARGC++]="fmt="substr($0,length($1)+2);
             ARGV[ARGC++]="fmtid="$1;
             ARGV[ARGC++]=fn;
             next}
     {match($0,/^ *[^ ]+[ ]+[^ ]+[ ]+/);
      printf fmt,$1,$2,substr($0,RLENGTH+1) > ("data/"fmtid"/"$1)
     }' fmtfile sampledata

$ echo "a\tb\n" | awk '{ s=$0; for (i=1;i<=length(s);i++) {printf("%d\t%c\n",i,substr(s,i,1));} }'
1       a
2       \
3       t
4       b
5       \
6       n

$ awk 'BEGIN{s="a\tb\n"; for (i=1;i<=length(s);i++) {printf("%d\t%c\n",i,substr(s,i,1));} }'
1       a
2               
3       b
4       

s=$0;
gsub(/\\t/,"\t",s);
gsub(/\\n/,"\n",s);