Awk-如何改进正则表达式?

Awk-如何改进正则表达式?,awk,Awk,我有一个文件: @Book{gjn2011ske, author = {Grzegorz J. Nalepa}, title = {Semantic Knowledge Engineering. A Rule-Based Approach}, publisher = {Wydawnictwa AGH}, year = 2011, address = {Krak\'ow} } @article{gjn2010jucs, Author = {

我有一个文件:

@Book{gjn2011ske, 
  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}
}

@article{gjn2010jucs,
  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010
}
我想改进只删除第一行的正则表达式注意:无法更改记录分隔符
RS=“}\n”

我试过:

awk 'BEGIN{ RS="}\n" } {gsub(/@.*,/,"") ; print }' file
我想打印结果:

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010
谢谢你的帮助

编辑:

我提议的解决办法:

awk 'BEGIN{ RS="}\n" }{sub(",","@"); sub(/@.*@/,""); print }' file 

使用指定的
RS
设置很难实现您想要的结果(因为
地址={Krak\ow}
有一个额外的记录结尾)。我宁愿选择:

awk '$0 !~ "^@" && $0 !~ "^} *$" { print }' FILE 
看到了

编辑我不知道为什么它必须与一个regexp解决方案,你能解释一下吗

无论如何,还有一个()解决方案使用regexp,但不是您所期望的:

awk 'BEGIN{ RS="}\n" }
{
  split($0,a,"\n")
  for (e=1;e<=length(a);e++) {
      if (a[e] ~ "{" && a[e] !~ "}") {
          sub("$","}",a[e])
      }
      if (a[e] ~ "=") { print a[e] }
  }
  printf("\n")
}' INPUTFILE

一种不使用正则表达式的方法。将字段分隔符设置为换行符,现在寄存器的每个键都将是一个字段。这样,遍历每个字段并打印那些不以
@
开头的字段:

awk '
    BEGIN { 
        RS="}\n"; 
        FS=OFS="\n"; 
    } 
    { 
        for (i=1; i<=NF; i++) { 
            if ( substr($i, 1, 1) != "@" ) { 
                printf "%s%s", $i, (i == NF) ? RS : OFS; 
            } 
        } 
    }
' file

我将使用
GNU sed
来执行此操作:

sed '/^@/,/^}$/ { //d }' file.txt
结果:

  author =   {Grzegorz J. Nalepa},
  title =    {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =     2011,
  address =  {Krak\'ow}

  Author =   {Grzegorz J. Nalepa},
  Journal =  {Journal of Universal Computer Science},
  Number =   7,
  Pages =    {1006-1023},
  Title =    {Collective Knowledge Engineering with Semantic Wikis},
  Volume =   16,
  Year =     2010
请注意,您可以使用
-i
标志进行适当的更改(即覆盖文件内容),也可以使用
-s
标志对多个文件进行更改。例如:

sed -s -i '/^@/,/^}$/ { //d }' *.txt
测试如下:

> awk '{if($0!~/@/&&$0!~/^}/)print}' temp
  author =       {Grzegorz J. Nalepa},
  title =        {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =         2011,
  address =      {Krak\'ow}

  Author =       {Grzegorz J. Nalepa},
  Journal =      {Journal of Universal Computer Science},
  Number =       7,
  Pages =        {1006-1023},
  Title =        {Collective Knowledge Engineering with Semantic Wikis},
  Volume =       16,
  Year =         2010
>

谢谢你的解决方案。您的示例将“}”保留在最后一行的末尾。请参阅我的编辑和建议的解决方案。感谢您提供的解决方案,但等待仍然存在,例如,正则表达式。请参阅“我的编辑”和“建议的解决方案”。@Tedee12345:无法更改
awk
的记录分隔符会产生比解决问题更多的问题。围绕这些问题编写代码从来都不是一个好主意。你应该考虑发布为什么你认为保持代码> RS=“}\n”是个好主意。如果是,请提供更多样本数据。祝你好运。谢谢你提供的解决方案,但是等待仍然存在,例如,正则表达式。请参阅我的编辑和建议的解决方案。再次感谢您的回复。你提出的第一个解决方案很适合我。这个答案与20小时前佐尔特的答案几乎相同。你应该考虑像我一样投票,谢谢你的解决方案的例子。
sed -s -i '/^@/,/^}$/ { //d }' *.txt
awk '{if($0!~/@/&&$0!~/^}/)print}' temp
> awk '{if($0!~/@/&&$0!~/^}/)print}' temp
  author =       {Grzegorz J. Nalepa},
  title =        {Semantic Knowledge Engineering. A Rule-Based Approach},
  publisher =    {Wydawnictwa AGH},
  year =         2011,
  address =      {Krak\'ow}

  Author =       {Grzegorz J. Nalepa},
  Journal =      {Journal of Universal Computer Science},
  Number =       7,
  Pages =        {1006-1023},
  Title =        {Collective Knowledge Engineering with Semantic Wikis},
  Volume =       16,
  Year =         2010
>