Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby-on-rails-4/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
gawk中的下一个命令未生成预期结果_Awk_Gawk - Fatal编程技术网

gawk中的下一个命令未生成预期结果

gawk中的下一个命令未生成预期结果,awk,gawk,Awk,Gawk,我试图跳过一堆以制表符分隔的文本文件的第一部分。(我转换为逗号分隔的示例数据。)我似乎不明白为什么这不起作用: 代码 资料 预期产量 "Country Of Sale","Total","Total Units1","Total Units2","Total C_F","SPCU","PCUT","CPS","USPS","Total Share","EffSUBS","ActSUBS" "AU","0","139851331","139851331","195833.36","0.0014

我试图跳过一堆以制表符分隔的文本文件的第一部分。(我转换为逗号分隔的示例数据。)我似乎不明白为什么这不起作用:

代码

资料

预期产量

"Country Of Sale","Total","Total  Units1","Total Units2","Total C_F","SPCU","PCUT","CPS","USPS","Total  Share","EffSUBS","ActSUBS"
"AU","0","139851331","139851331","195833.36","0.001400297","1170","1.36","","1.36","91704.63","99430"
另外,我想让“销售国”行成为所有文件的标题。但NR和FNR从一开始就开始计算。如果每个文件中的“销售国”显示在不同的行号中,我该怎么做


谢谢你的帮助

[…]
是一个括号表达式,包含字符列表、字符集或字符范围。它不包含字符串或字符串的求反

[^Country Of Sale]
=
[^aCFelnoOrStuy]

你可能是说:

!/Country Of Sale/
这仍然不是你真正需要的。试试这个:

gawk '
  BEGIN { FS=OFS="\t" }
  /Country Of Sale/ { f=1 }
  /Cloud Total/ { f=0; nextfile }
  f { print FILENAME, $0 }
' RAW/iTunes/iTunesMatch/*.txt > munched/iTunesMatch_TEST.txt
看:

如果您有多个输入文件,并且只希望销售国行显示一次,则一种方法是:

$ gawk '
   BEGIN { FS=OFS="\t" }
   /Country Of Sale/ { f=1; if (NR==FNR) print FILENAME, $0; next}
   /Cloud Total/ { f=0; nextfile }
   f { print FILENAME, $0 }
' file file file
file    "Country Of Sale","Total","Total  Units1","Total Units2","Total C_F","SPCU","PCUT","CPS","USPS","Total  Share","EffSUBS","ActSUBS"
file    "AU","0","139851331","139851331","195833.36","0.001400297","1170","1.36","","1.36","91704.63","99430"
file    "AU","0","139851331","139851331","195833.36","0.001400297","1170","1.36","","1.36","91704.63","99430"
file    "AU","0","139851331","139851331","195833.36","0.001400297","1170","1.36","","1.36","91704.63","99430"

正如我在评论中所指出的,
/[^Country Of Sale]/
可能没有做你认为应该做的事情。提示:其中一个重复的空格是多余的。(恰好,空白是否定字符类中唯一重复的字符。)

它实际上是查找除
[COSaeflnortuy]
中的一个字符(方括号是元字符)之外的任何字符,如果找到一个,则跳到下一行。例如,如果该行包含双引号或逗号,它将跳转到下一行输入(因为方括号中既没有双引号也没有逗号)

请注意,在CSV数据中,“Cloud Total”不是以
C
开头的行;它以双引号开头。不幸的是,您搜索它的正则表达式坚持认为
C
必须是第一个字符

我想你需要这样的东西:

gawk 'FNR==1,/Country Of Sale/ { next }
      /Cloud Total/ { nextfile }
      { print }' data
这只列出了给定数据中的AU行(如果在一个命令行中列出同一个文件3次,则会得到3行以AU开头的代码,因此跨文件工作正常,部分原因是范围
FNR==1,/…/

你应该可以从那里得到它。如果您愿意,您可以使模式更具限制性(
/^“销售国”、/
等)。您可以使用
{print FILENAME of s$0}
打印以文件名和输出字段分隔符(命令行中的选项卡)为前缀的行


这一点,以及@Ed的建议,都给出了所有数据行,而不仅仅是“销售国”和“云总量”之间的数据

这就是我得到的(在运行macOS Sierra 10.12.6的Mac上,使用自制的
GNU Awk 4.1.3,API:1.1
):

考虑到我给了它要处理的文件3次,这是我所期望的,并且似乎是您想要的

如果要在输出中添加“销售国”标题行,可以很容易地添加:

gawk 'FNR==1,/Country Of Sale/ { if ($0 ~ /Country Of Sale/) print; next }
      /Cloud Total/ { nextfile }
      { print }' data
如果您只希望头出现一次,即使它出现在许多文件中,那么:

gawk 'FNR==1,/Country Of Sale/ { if ($0 ~ /Country Of Sale/ && hdr_count++ == 0) print; next }
      /Cloud Total/ { nextfile }
      { print }' data

感谢@EdMorton@@JonathanLeffler为我提供了必要的线索。最终起作用的是使用
/^Country Of Sale/{next}
&
/^Cloud Total/{nextfile}
。下一步,我要去弄清楚这到底是怎么回事

你认为
/[^Country Of Sale]/
在做什么?它可能没有做你认为它应该做的事情。提示:其中一个重复的空格是多余的。(恰好,空白是否定字符类中唯一重复的字符。)这给了我更多的线索。我刚刚开始学习(g)awk,只是为了一个特定的项目。所以我是个十足的傻瓜。谢谢。另一个提示是,删除
[]
对。。。。以及捕获未知行号上的标题?我将处理该文件两次,
/Country of Sale/{hdr=$0}
是您需要的第一个文件。祝您好运,请注意,在您的CSV数据中,
“Cloud Total”
不是以
C
开头的行;它以双引号开头。我想你需要这样的东西:
gawk'FNR==1,/Country Of Sale/{next}/Cloud Total/{nextfile}{print}数据
。这只列出了给定数据中的AU行(并且处理同一个文件3次,得到3行以AU开头)。在我的实际数据中,它是以制表符分隔的,没有双引号。我在将标签改为逗号时添加了这些,只是为了便于在区块报价中查看。谢谢Ed。但这似乎给了我所有的行,而不仅仅是“销售国”和“云总数”之间的行。然后您复制/粘贴了错误的代码,或者您的输入与您向我们显示的不一样,因为答案中的代码不可能这样做。和你提供的输入。我更新了问题以显示脚本从您发布的输入中生成您想要的输出,这一定比我想象的更难!这一点,以及@Ed的建议,都给出了所有数据行,而不是“销售国”和“云总量”之间的数据行。我会继续玩你们教给我的各种排列。只是关于[…]意味着什么的教训是有价值的。(这就是我的awk水平。)好奇-查看我的更新显示我得到了什么。我想这意味着你一定是从剧本中误抄了什么。从Ed的脚本中,我得到数据文件的每个副本有三行,“销售国”行和“云总数”行也包含在他的脚本的输出中。如果您希望在输出中包含“销售国”标题行,则会有第二个更新,给出该.hmmm。我只是再试一次。我直接剪切粘贴脚本(并添加了我自己的“数据”)在gawk程序之后。我得到了所有的行。非常非常好奇!如果你将我答案中的数据复制到另一个文件并处理它会发生什么?或者Ed答案中的数据?你检查过你使用的GNU Awk版本吗?什么平台?我不确定行尾会如何影响这一点,但你检查过吗
gawk 'FNR==1,/Country Of Sale/ { next }
      /Cloud Total/ { nextfile }
      { print }' data
$ cat data
"Start Date","End Date","UPC" "4/2/17","5/6/17","SKIP THIS LINE"
"4/2/17","5/6/17","SKIP THIS LINE" "4/2/17","5/6/17","SKIP THIS LINE"
"4/2/17","5/6/17","SKIP THIS LINE" "4/2/17","5/6/17","SKIP THIS LINE"
"Row Count","447","SKIP THIS LINE" 
"Country Of Sale","Total","Total  Units1","Total Units2","Total C_F","SPCU","PCUT","CPS","USPS","Total  Share","EffSUBS","ActSUBS"
"AU","0","139851331","139851331","195833.36","0.001400297","1170","1.36","","1.36","91704.63","99430"
"Cloud Total","1.36" "Sales Total","243.18" "Total Amount","244.54"
$ gawk 'FNR==1,/Country Of Sale/{next} /Cloud Total/ {nextfile} { print }' data data data
"AU","0","139851331","139851331","195833.36","0.001400297","1170","1.36","","1.36","91704.63","99430"
"AU","0","139851331","139851331","195833.36","0.001400297","1170","1.36","","1.36","91704.63","99430"
"AU","0","139851331","139851331","195833.36","0.001400297","1170","1.36","","1.36","91704.63","99430"
$
gawk 'FNR==1,/Country Of Sale/ { if ($0 ~ /Country Of Sale/) print; next }
      /Cloud Total/ { nextfile }
      { print }' data
gawk 'FNR==1,/Country Of Sale/ { if ($0 ~ /Country Of Sale/ && hdr_count++ == 0) print; next }
      /Cloud Total/ { nextfile }
      { print }' data