从Shell中的文件提取字段
我在一个文件中有很多数据,如下所示从Shell中的文件提取字段,shell,awk,Shell,Awk,我在一个文件中有很多数据,如下所示 alert tcp any any -> any any (msg: "test1"; sid:16521; rev:1;created_at 2010_07_30, updated_at 2016_07_01;) alert tcp any any -> any any (msg: "test2"; nocase; sid :23476;distance:0; rev:1;created_at 2010_10_30, updated_at 2
alert tcp any any -> any any (msg: "test1"; sid:16521; rev:1;created_at 2010_07_30, updated_at 2016_07_01;)
alert tcp any any -> any any (msg: "test2"; nocase; sid :23476;distance:0; rev:1;created_at 2010_10_30, updated_at 2013_07_11;)
alert tcp any any -> any any (msg: "test3"; sid:236487; file_data; content:"clsid"; nocase; distance:0; created_at 2008_08_03, updated_at 2016_05_01;
我想从文件中提取sid、msg、created_at和updated_at,输出如下
test1 | 16521 | 2010_07_30 | 2016_07_01
test2 | 23476 | 2010_10_30 | 2013_07_11
test3 | 236487| 2008_08_03 | 2016_05_01
我用的剧本是
cat $file | grep -v "^#" | grep "^alert" | sed 's/\"//g' | awk -F ';' '
{
for(i=1;i<=NF;i++)
{
if (match($i,"sid:")>0)
{
split($i, array1, ":")
Rule_sid=array1[2]
}
if(match($i,"msg:")>0)
{
split($i, array, "(")
split(array[2], array2, ":")
message=array2[2]
}
if(match($i,/metadata:/)>0 )
{
split($i, array3,/created_at/)
create_date=array3[2]
}
if(match($i,/metadata:/)>0 )
{
split($i, array4, ", updated_at ")
update_date=array4[2]
}
}
print Rule_sid "|" message "|" create_date "|" update_date
}' >> Rule_Files/$file
cat$file | grep-v“^#”| grep“^alert”| sed's/\“//g'| awk-F'”
{
对于(i=1;i0)
{
拆分($i,数组1,“:”)
规则_sid=array1[2]
}
如果(匹配($i,“msg:”)>0)
{
拆分($i,数组,“(”)
拆分(数组[2],数组2,“:”)
message=array2[2]
}
如果(匹配($i,/metadata:/)>0)
{
拆分($i,数组3,/created\u at/)
创建_date=array3[2]
}
如果(匹配($i,/metadata:/)>0)
{
拆分($i,数组4,“,更新位置”)
更新日期=阵列4[2]
}
}
打印规则sid“|”消息“|”创建日期“|”更新日期
}'>>规则_文件/$file
对于初学者,可以使用sed
使用正则表达式提取字段:
sed '/^alert/s/^.*msg[: ]*"\([^"]*\)".*sid[: ]*\([0-9][0-9]*\);.*created_at *\([^,]*\),.*updated_at *\([0-9_][0-9_]*\).*$/\1|\2|\3|\4/' $file
这将为您提供如下输出:
test1|16521|2010_07_30|2016_07_01
test2|23476|2010_10_30|2013_07_11
test3|236487|2008_08_03|2016_05_01
现在,如果你想把它很好地排列成列,你就必须把它输入到其他东西,也许是awk:
sed '/^alert/s/^.*msg[: ]*"\([^"]*\)".*sid[: ]*\([0-9][0-9]*\);.*created_at *\([^,]*\),.*updated_at *\([0-9_][0-9_]*\).*$/\1|\2|\3|\4/' $file |
awk -F\| 'BEGIN { OFS="| " } {$2=sprintf("%6d",$2)}1'
这就给了你:
test1| 16521| 2010_07_30| 2016_07_01
test2| 23476| 2010_10_30| 2013_07_11
test3| 236487| 2008_08_03| 2016_05_01
如果您必须处理任意宽的列值,并且仍然需要垂直对齐,那么您必须先编写处理所有行的内容,以在打印任何内容之前找到每行的最宽值。这是我留给读者的练习。使用awk
根据您的兴趣修改-v OFS=“|”
和-v extract=“msg,sid,created_at,updated_at”
,OFS
是输出字段分隔符,变量extract
保存需要解析的字段列表(用逗号分隔),如果未找到任何字段,它将给出Null
程序假定字段值存在于当前字段匹配旁边,假定在j=4
时找到字段sid
,其值存在于j+1
即j=5
处
输入
$ cat file
alert tcp any any -> any any (msg: "test1"; sid:16521; rev:1;created_at 2010_07_30, updated_at 2016_07_01;)
alert tcp any any -> any any (msg: "test2"; nocase; sid :23476;distance:0; rev:1;created_at 2010_10_30, updated_at 2013_07_11;)
alert tcp any any -> any any (msg: "test3"; sid:236487; file_data; content:"clsid"; nocase; distance:0; created_at 2008_08_03, updated_at 2016_05_01;)
$ awk -v OFS=" | " -v extract="msg,sid,created_at,updated_at" '
BEGIN{
split(extract,Fields,/,/)
}
{
gsub(/[:";,()]/," ");
s="";
for(i=1; i in Fields; i++)
{
f = 1
for(j=1; j<=NF; j++)
{
if($j==Fields[i])
{
f = 0
s = ( s ? s OFS :"") $(j+1)
break
}
}
if(f){
s = (s ? s OFS:"") "Null"
}
}
print s
}' file
test1 | 16521 | 2010_07_30 | 2016_07_01
test2 | 23476 | 2010_10_30 | 2013_07_11
test3 | 236487 | 2008_08_03 | 2016_05_01
输出
$ cat file
alert tcp any any -> any any (msg: "test1"; sid:16521; rev:1;created_at 2010_07_30, updated_at 2016_07_01;)
alert tcp any any -> any any (msg: "test2"; nocase; sid :23476;distance:0; rev:1;created_at 2010_10_30, updated_at 2013_07_11;)
alert tcp any any -> any any (msg: "test3"; sid:236487; file_data; content:"clsid"; nocase; distance:0; created_at 2008_08_03, updated_at 2016_05_01;)
$ awk -v OFS=" | " -v extract="msg,sid,created_at,updated_at" '
BEGIN{
split(extract,Fields,/,/)
}
{
gsub(/[:";,()]/," ");
s="";
for(i=1; i in Fields; i++)
{
f = 1
for(j=1; j<=NF; j++)
{
if($j==Fields[i])
{
f = 0
s = ( s ? s OFS :"") $(j+1)
break
}
}
if(f){
s = (s ? s OFS:"") "Null"
}
}
print s
}' file
test1 | 16521 | 2010_07_30 | 2016_07_01
test2 | 23476 | 2010_10_30 | 2013_07_11
test3 | 236487 | 2008_08_03 | 2016_05_01
$awk-vofs=“|”-v extract=“msg,sid,创建时间,更新时间”
开始{
拆分(提取,字段,/,/)
}
{
gsub(/[:”;,()]/,”);
s=“”;
for(i=1;字段中的i;i++)
{
f=1
对于(j=1;jSo您提供了您尝试过的代码。它出了什么问题?它输出的内容和您想要的内容之间有什么不同?我无法像上面那样正确地提取created_at&updated_at抱歉,我已经做了所有必需的更改。与其他字段不同,created_at
和updated_at?帖子发到哪里去了?