Bash 解析awk中已解析的列

Bash 解析awk中已解析的列,bash,shell,parsing,unix,awk,Bash,Shell,Parsing,Unix,Awk,我正在尝试使用awk解析如下所示的文本文件: 001 data John Smith address "London" | occupation "Driver" | exercise_level "Medium" 002 data Rob Edward address "Cardiff" | occupation "Physiotherapist" | exercise_level "High" 003 data Dara Pronk address "

我正在尝试使用awk解析如下所示的文本文件:

001  data   John    Smith   address "London" | occupation "Driver" | exercise_level "Medium"
002  data   Rob Edward  address "Cardiff" | occupation "Physiotherapist" | exercise_level "High"
003  data   Dara    Pronk   address "Groningen" | country "Holland" | occupation "Teacher" | exercise_level "Low"
004  data   Marina  Francesca   address "Lugano" | country "Switzerland" | occupation "Chef" | exercise_level "High"
001  data   John    Smith   Driver
002  data   Rob Edward  Physiotherapist
003  data   Dara    Pronk   Teacher
004  data   Marina  Francesca   Chef
前4列由tab分隔,第5列由管道分隔一些元数据

我想得到职业“键”的“值”作为我的第五列。我期望的输出如下所示:

001  data   John    Smith   address "London" | occupation "Driver" | exercise_level "Medium"
002  data   Rob Edward  address "Cardiff" | occupation "Physiotherapist" | exercise_level "High"
003  data   Dara    Pronk   address "Groningen" | country "Holland" | occupation "Teacher" | exercise_level "Low"
004  data   Marina  Francesca   address "Lugano" | country "Switzerland" | occupation "Chef" | exercise_level "High"
001  data   John    Smith   Driver
002  data   Rob Edward  Physiotherapist
003  data   Dara    Pronk   Teacher
004  data   Marina  Francesca   Chef
我可以通过以下命令获得占领:

awk -F'[\t|]' '{for(i=5;i<=NF;i++){if($i~/^ occupation/){c=$i}} print $1, $2, $3, $4, c}' my_file
awk-F'[\t |]'{for(i=5;i使用GNU-awk

$ awk '{match($0,/occupation "([^"]*)"/,arr);print $1,$2,$3,$4,arr[1]}' infile
001 data John Smith Driver
002 data Rob Edward Physiotherapist
003 data Dara Pronk Teacher
004 data Marina Francesca Chef
其他awk

$ awk '{
         match($0,/occupation "([^"]*)"/); 
         s=substr($0,RSTART,RLENGTH); 
         gsub(/.* "|"/,"",s); 
         print $1,$2,$3,$4,s
}' infile
001 data John Smith Driver
002 data Rob Edward Physiotherapist
003 data Dara Pronk Teacher
004 data Marina Francesca Chef
输入:

$ cat infile
001  data   John    Smith   address "London" | occupation "Driver" | exercise_level "Medium"
002  data   Rob Edward  address "Cardiff" | occupation "Physiotherapist" | exercise_level "High"
003  data   Dara    Pronk   address "Groningen" | country "Holland" | occupation "Teacher" | exercise_level "Low"
004  data   Marina  Francesca   address "Lugano" | country "Switzerland" | occupation "Chef" | exercise_level "High"
--编辑以处理评论--

只是想知道,在第二个选项(其他awk)中,是否可以 存储其他变量(例如var s的占用和练习水平 风险值(e)

根据您的需要修改变量
search=“…”
,输入的顺序与给出结果的方式相同

awk -v search="occupation,exercise_level,address" '
BEGIN{
    split(search, arr, /,/) 
}
{
    str = "";
    for(i=1; i in arr; i++)
    {
          regexp = arr[i]" \"([^\"]*)\"";
          if(match($0,regexp)){ 
            s=substr($0,RSTART,RLENGTH); 
            gsub(/.* "|"/,"",s);
            str = (str ? str OFS : "") s 
           }
     }
         print $1,$2,$3,$4,str
}' infile
使用任何旧的awk(GNU也可以,但不是必需的):

分拆以便于阅读(和评论):

虽然
split()
的额外步骤和
for
循环可能看起来很麻烦,但它的优点是可以在一个方便的数组中按名称提供所有嵌入数据。(这解决了您在对3161993答案的评论中提出的请求。)


请注意,目前,
split()
在空白处中断,因此,如果您希望能够处理包含空格的数据(即引号内),则需要做更多的工作。如果您希望在显示输出时不带引号,则可以在for循环中分配数据后对其进行
gsub()
操作(删除所有引号)或者使用一对
sub()
命令删除前导引号和尾随引号。

这就是我想要的,非常感谢!!只是想知道,在第二个选项(其他awk)中,是否可以存储其他变量(例如var s的占用和var e的练习级别)?@kaka01秒选项你必须一个接一个地做,复制和粘贴,将
occulation
更改为其他键,将
s
更改为其他变量,并将其替换为gsub中的相同项,否则你可以循环thi@3161993。谢谢!你能简要解释一下
str=(str?str of s:)s
在这里的意思吗?
str=(str?str of s:“)s
是变量
str
的串联,假设您有两个或多个键要打印,那么循环
str
的第一次迭代将为null,因此
str=s
,第二次迭代
str
不为null,因此
str=str of s
等等,上面的一次迭代可以写成
if(str){str=str of s}其他{str=s}
BEGIN {
  OFS=FS='\t'           # set the input field separator
} 

{
  split($5,a,/ *\| */)  # split your embedded array by vertical bar
  for (i in a) {        # step through the array,
    split(a[i],b," ")   # splitting as you go
    #gsub(/"/,"",b[2])  # optionally remove quotes
    d[b[1]]=b[2]        # and assigning indices in a new data array
  }
  print $1 OFS $2 OFS $3 OFS $4 OFS d["occupation"]     # and print the result
}