Bash 解析awk中已解析的列
我正在尝试使用awk解析如下所示的文本文件:Bash 解析awk中已解析的列,bash,shell,parsing,unix,awk,Bash,Shell,Parsing,Unix,Awk,我正在尝试使用awk解析如下所示的文本文件: 001 data John Smith address "London" | occupation "Driver" | exercise_level "Medium" 002 data Rob Edward address "Cardiff" | occupation "Physiotherapist" | exercise_level "High" 003 data Dara Pronk address "
001 data John Smith address "London" | occupation "Driver" | exercise_level "Medium"
002 data Rob Edward address "Cardiff" | occupation "Physiotherapist" | exercise_level "High"
003 data Dara Pronk address "Groningen" | country "Holland" | occupation "Teacher" | exercise_level "Low"
004 data Marina Francesca address "Lugano" | country "Switzerland" | occupation "Chef" | exercise_level "High"
001 data John Smith Driver
002 data Rob Edward Physiotherapist
003 data Dara Pronk Teacher
004 data Marina Francesca Chef
前4列由tab分隔,第5列由管道分隔一些元数据
我想得到职业“键”的“值”作为我的第五列。我期望的输出如下所示:
001 data John Smith address "London" | occupation "Driver" | exercise_level "Medium"
002 data Rob Edward address "Cardiff" | occupation "Physiotherapist" | exercise_level "High"
003 data Dara Pronk address "Groningen" | country "Holland" | occupation "Teacher" | exercise_level "Low"
004 data Marina Francesca address "Lugano" | country "Switzerland" | occupation "Chef" | exercise_level "High"
001 data John Smith Driver
002 data Rob Edward Physiotherapist
003 data Dara Pronk Teacher
004 data Marina Francesca Chef
我可以通过以下命令获得占领:
awk -F'[\t|]' '{for(i=5;i<=NF;i++){if($i~/^ occupation/){c=$i}} print $1, $2, $3, $4, c}' my_file
awk-F'[\t |]'{for(i=5;i使用GNU-awk
$ awk '{match($0,/occupation "([^"]*)"/,arr);print $1,$2,$3,$4,arr[1]}' infile
001 data John Smith Driver
002 data Rob Edward Physiotherapist
003 data Dara Pronk Teacher
004 data Marina Francesca Chef
其他awk
$ awk '{
match($0,/occupation "([^"]*)"/);
s=substr($0,RSTART,RLENGTH);
gsub(/.* "|"/,"",s);
print $1,$2,$3,$4,s
}' infile
001 data John Smith Driver
002 data Rob Edward Physiotherapist
003 data Dara Pronk Teacher
004 data Marina Francesca Chef
输入:
$ cat infile
001 data John Smith address "London" | occupation "Driver" | exercise_level "Medium"
002 data Rob Edward address "Cardiff" | occupation "Physiotherapist" | exercise_level "High"
003 data Dara Pronk address "Groningen" | country "Holland" | occupation "Teacher" | exercise_level "Low"
004 data Marina Francesca address "Lugano" | country "Switzerland" | occupation "Chef" | exercise_level "High"
--编辑以处理评论--
只是想知道,在第二个选项(其他awk)中,是否可以
存储其他变量(例如var s的占用和练习水平
风险值(e)
根据您的需要修改变量search=“…”
,输入的顺序与给出结果的方式相同
awk -v search="occupation,exercise_level,address" '
BEGIN{
split(search, arr, /,/)
}
{
str = "";
for(i=1; i in arr; i++)
{
regexp = arr[i]" \"([^\"]*)\"";
if(match($0,regexp)){
s=substr($0,RSTART,RLENGTH);
gsub(/.* "|"/,"",s);
str = (str ? str OFS : "") s
}
}
print $1,$2,$3,$4,str
}' infile
使用任何旧的awk(GNU也可以,但不是必需的):
分拆以便于阅读(和评论):
虽然split()
的额外步骤和for
循环可能看起来很麻烦,但它的优点是可以在一个方便的数组中按名称提供所有嵌入数据。(这解决了您在对3161993答案的评论中提出的请求。)
请注意,目前,split()
在空白处中断,因此,如果您希望能够处理包含空格的数据(即引号内),则需要做更多的工作。如果您希望在显示输出时不带引号,则可以在for循环中分配数据后对其进行gsub()
操作(删除所有引号)或者使用一对sub()
命令删除前导引号和尾随引号。这就是我想要的,非常感谢!!只是想知道,在第二个选项(其他awk)中,是否可以存储其他变量(例如var s的占用和var e的练习级别)?@kaka01秒选项你必须一个接一个地做,复制和粘贴,将occulation
更改为其他键,将s
更改为其他变量,并将其替换为gsub中的相同项,否则你可以循环thi@3161993。谢谢!你能简要解释一下str=(str?str of s:)s
在这里的意思吗?str=(str?str of s:“)s
是变量str
的串联,假设您有两个或多个键要打印,那么循环str
的第一次迭代将为null,因此str=s
,第二次迭代str
不为null,因此str=str of s
等等,上面的一次迭代可以写成if(str){str=str of s}其他{str=s}
BEGIN {
OFS=FS='\t' # set the input field separator
}
{
split($5,a,/ *\| */) # split your embedded array by vertical bar
for (i in a) { # step through the array,
split(a[i],b," ") # splitting as you go
#gsub(/"/,"",b[2]) # optionally remove quotes
d[b[1]]=b[2] # and assigning indices in a new data array
}
print $1 OFS $2 OFS $3 OFS $4 OFS d["occupation"] # and print the result
}