Awk 如何将数据从应用程序日志文件提取到csv文件
我在下面的部分有一个application.log文件Awk 如何将数据从应用程序日志文件提取到csv文件,awk,Awk,我在下面的部分有一个application.log文件 2020-04-14 01:04:06 INFO app-proxy:53 - Params : {"file_list":["USER_INFO_1234567.tar.md5"],"conn_id":"6adfda","Token":"vishal.kg","proxy_name":"India-Noida","hash":"sdadjsasdkj"} 2020-04-14 01:24:44 INFO app-proxy:53 -
2020-04-14 01:04:06 INFO app-proxy:53 - Params : {"file_list":["USER_INFO_1234567.tar.md5"],"conn_id":"6adfda","Token":"vishal.kg","proxy_name":"India-Noida","hash":"sdadjsasdkj"}
2020-04-14 01:24:44 INFO app-proxy:53 - Params : {"file_list":["ATT_SAMPLE_TERT.tar.md5"],"conn_id":"adfdsfed","Token":"venkat.raj","proxy_name":"India-Noida","hash":"qieuadsjkasdjk"}
2020-04-14 03:16:06 INFO app-proxy:53 - Params : {"file_list":["KER_SAMPLE_TERT.tar.md5"],"conn_id":"kajdfldk","Token":"ankit.ys","proxy_name":"India-Noida","hash":"asdkfjds"}
2020-04-14 03:18:15 INFO app-proxy:53 - Params : {"file_list":["DU_SAMPLE_TERT.tgz"],"conn_id":"u9sdf7ds9","Token":"shitha.a","proxy_name":"India-Noida","hash":"wqeirdasjk"}
2020-04-14 04:30:02 INFO app-proxy:53 - Params : {"file_list":["MBU_SAMAPLE.tar.md5"],"conn_id":"a8df7dsd","Token":"karthi.v","proxy_name":"India-Noida","hash":"odisfdjda"}
2020-04-14 05:22:06 INFO app-proxy:53 - Params : {"file_list":["PCL_SAMPLE-15637481.tar.md5"],"conn_id":"8adf8das","Token":"b.venkat","proxy_name":"India-Noida","hash":"adfjkds"}
我试图找到Params:section,从中我需要将日期、时间、文件列表、令牌、代理名称提取到csv文件中
然后我有另一个user-session.log文件,它包含以下内容
2020-04-14 01:04:07 REMOTE_START null null USER_INFO_1234567.tar.md5 0 0/0
2020-04-14 01:15:18 REMOTE_END null null USER_INFO_1234567.tar.md5 672 7219067/7209967
2020-04-14 01:24:45 REMOTE_START null null ATT_SAMPLE_TERT.tar.md5 0 0/0
2020-04-14 01:34:12 REMOTE_END null null ATT_SAMPLE_TERT.tar.md5 568 52401769/50176769
2020-04-14 03:16:08 REMOTE_START null null KER_SAMPLE_TERT.tar.md5 0 0/0
2020-04-14 03:16:22 REMOTE_END null null KER_SAMPLE_TERT.tar.md5 16 1059346/70840514
2020-04-14 03:18:17 REMOTE_START null null DU_SAMPLE_TERT.tgz 0 0/0
2020-04-14 03:18:18 REMOTE_END null null DU_SAMPLE_TERT.tgz 2 949685/949685
2020-04-14 04:30:04 REMOTE_START null null MBU_SAMAPLE.tar.md55 0 0/0
2020-04-14 04:30:05 REMOTE_END null null MBU_SAMAPLE.tar.md5 2 2857069/2857069
2020-04-14 05:22:12 REMOTE_START null PCL_SAMPLE-15637481.tar.md5 0 0/0
2020-04-14 05:22:15 REMOTE_END null null PCL_SAMPLE-15637481.tar.md5 9 93829204/93829204
从此远程\u开始和远程\u结束包含每个用户的数据。我需要提取与上述csv文件相同。所需输出符合以下屏幕截图
任何方向都会很有帮助 编辑:由于OP更改了输入_文件样本数据,因此现在根据它添加解决方案
awk '
FNR==NR{
match($0,/{"file_list":\["[^"]*/)
userData=substr($0,RSTART,RLENGTH)
sub(/{"file_list":\["/,"",userData)
match($0,/"Token":"[^"]*/)
tokenVal=substr($0,RSTART,RLENGTH)
sub(/.*"/,"",tokenVal)
match($0,/"proxy_name":"[^"]*/)
proxyName=substr($0,RSTART,RLENGTH)
sub(/.*"/,"",proxyName)
arrayId[userData]=$1 OFS $2 OFS userData OFS tokenVal OFS proxyName
next
}
!firstOccur[$6]++{
firstOccurVal[$6]=$2
}
($6 in arrayId){
Output[$6]=arrayId[$6] OFS firstOccurVal[$6] OFS $2
}
END{
for(key in Output){
print Output[key]
}
}' application.log user-session.log
如果您希望以好看的格式输出,还可以在上述代码中附加| column-t
你能试试下面的吗
awk '
match($0,/USERDATA_TMB_[^"]*/){
userd=substr($0,RSTART,RLENGTH)
split(userd,array,"_")
val=array[3]
}
FNR==NR{
match($0,/"Token":"[^"]*/)
tVal=substr($0,RSTART,RLENGTH)
sub(/.*"/,"",tVal)
match($0,/"proxy_name":"[^"]*/)
pname=substr($0,RSTART,RLENGTH)
sub(/.*"/,"",pname)
arrayId[val]=$1 OFS $2 OFS userd OFS tVal OFS pname
next
}
!firstOccur[val]++{
firstOccurVal[val]=$2
}
(val in arrayId){
Output[val]=arrayId[val] OFS firstOccurVal[val] OFS $2
}
END{
for(key in Output){
print Output[key]
}
}
' application.log user-session.log
我建议您添加一个所需输出的示例。很难准确理解你想要什么。嗨,路易斯,我已经更新了问题。@Kishore,你能解释一下获得预期输出的逻辑吗?在这两个文件中匹配的是USERDATA\u TMB吗?请多解释一下,然后让我们知道。@Ravindersingh,是的,使用两个日志的USERDATA\u TMB和timestamp来获取远程启动和远程_END@Kishore,这是完全不同的数据,现在让我重新表述解决方案。@Ed Morton,先生,您好!!如果你看了这篇文章,那么现在数组名如何呢?