Json 从文本文件中提取多个匹配模式
我需要从文本文件中的每一行提取所有K[A-Z]{4}和US[C,W][0-9]{8}值 我正在使用下面的代码尝试实现这一点,但是,我需要根据以下条件提取这些值:只有两个值都出现在给定的行中(即下面数据中的最后三行) 尝试的代码:Json 从文本文件中提取多个匹配模式,json,regex,awk,sed,grep,Json,Regex,Awk,Sed,Grep,我需要从文本文件中的每一行提取所有K[A-Z]{4}和US[C,W][0-9]{8}值 我正在使用下面的代码尝试实现这一点,但是,我需要根据以下条件提取这些值:只有两个值都出现在给定的行中(即下面数据中的最后三行) 尝试的代码: #Filters out any values matching K[A-Z]{4} grep -Po '"\K[A-Z]{4}\b' usc.matched > out.1 #Filters out any values matching US[C,W][0-
#Filters out any values matching K[A-Z]{4}
grep -Po '"\K[A-Z]{4}\b' usc.matched > out.1
#Filters out any values matching US[C,W][0-9]{8}
grep -Po '\bUS\w*' usc.matched > out.2
#Pastes two datasets together, separated by a comma
paste -d',' out.1 out.2 > stations.filtered
#Removes any lines that do not lead with "K"
sed -i '/^[^K]/d' stations.filtered
{"sids": ["94737 1", "RUT 3", "KRUT 5"], "name": "RUTLAND STATE AP"},
{"sids": ["54740 1", "VSF 3", "KVSF 5", "USW00054740 6"], "name": "SPRINGFIELD HARTNESS AP"},
{"sids": ["94601 1", "RKD 3", "KRKD 5"], "name": "ROCKLAND KNOX CO RGNL AP"},
{"sids": ["20B 3"], "name": "ROCKLAND STN"},
{"sids": ["177250 2", "USC00177250 6"], "name": "ROCKLAND"},
{"sids": ["177255 2", "USC00177255 6", "RCKM1 7"], "name": "ROCKLAND"},
{"sids": ["177260 2"], "name": "ROCKLAND MOORING LBS"},
{"sids": [], "name": "ROCKLAND"},
{"sids": ["14612 1"], "name": "ROCKLAND"},
{"sids": ["274380 2", "USC00274380 6"], "name": "KEARSARGE"},
{"sids": ["192770 2", "USC00192770 6"], "name": "FISKDALE"},
{"sids": ["US1CTNL0005 6", "CTNL0005 10"], "name": "OAKDALE 2.6 WNW"},
{"sids": ["063989 2", "USC00063989 6"], "name": "LAKE KONOMOC"},
{"sids": ["14740 1", "14721 1", "063456 2", "069704 2", "BDL 3", "72508 4", "KBDL 5", "USW00014740 6", "BDL 7"], "name": "HARTFORD-BRADLEY INTL AP"},
{"sids": ["94702 1", "060806 2", "BDR 3", "72504 4", "KBDR 5", "USW00094702 6", "BDR 7"], "name": "IGOR I SIKORSKY MEMORI AP"},
{"sids": ["54734 1", "DXR 3", "KDXR 5", "USW00054734 6"], "name": "DANBURY MUNI AP"},
KRUT,
KVSF,USW00054740
KRKD
USC00177250
USC00177255
USC00274380
USC00192770
US1CTNL0005
USC00063989
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
KVSF,USW00054740
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
JSON数据:
#Filters out any values matching K[A-Z]{4}
grep -Po '"\K[A-Z]{4}\b' usc.matched > out.1
#Filters out any values matching US[C,W][0-9]{8}
grep -Po '\bUS\w*' usc.matched > out.2
#Pastes two datasets together, separated by a comma
paste -d',' out.1 out.2 > stations.filtered
#Removes any lines that do not lead with "K"
sed -i '/^[^K]/d' stations.filtered
{"sids": ["94737 1", "RUT 3", "KRUT 5"], "name": "RUTLAND STATE AP"},
{"sids": ["54740 1", "VSF 3", "KVSF 5", "USW00054740 6"], "name": "SPRINGFIELD HARTNESS AP"},
{"sids": ["94601 1", "RKD 3", "KRKD 5"], "name": "ROCKLAND KNOX CO RGNL AP"},
{"sids": ["20B 3"], "name": "ROCKLAND STN"},
{"sids": ["177250 2", "USC00177250 6"], "name": "ROCKLAND"},
{"sids": ["177255 2", "USC00177255 6", "RCKM1 7"], "name": "ROCKLAND"},
{"sids": ["177260 2"], "name": "ROCKLAND MOORING LBS"},
{"sids": [], "name": "ROCKLAND"},
{"sids": ["14612 1"], "name": "ROCKLAND"},
{"sids": ["274380 2", "USC00274380 6"], "name": "KEARSARGE"},
{"sids": ["192770 2", "USC00192770 6"], "name": "FISKDALE"},
{"sids": ["US1CTNL0005 6", "CTNL0005 10"], "name": "OAKDALE 2.6 WNW"},
{"sids": ["063989 2", "USC00063989 6"], "name": "LAKE KONOMOC"},
{"sids": ["14740 1", "14721 1", "063456 2", "069704 2", "BDL 3", "72508 4", "KBDL 5", "USW00014740 6", "BDL 7"], "name": "HARTFORD-BRADLEY INTL AP"},
{"sids": ["94702 1", "060806 2", "BDR 3", "72504 4", "KBDR 5", "USW00094702 6", "BDR 7"], "name": "IGOR I SIKORSKY MEMORI AP"},
{"sids": ["54734 1", "DXR 3", "KDXR 5", "USW00054734 6"], "name": "DANBURY MUNI AP"},
KRUT,
KVSF,USW00054740
KRKD
USC00177250
USC00177255
USC00274380
USC00192770
US1CTNL0005
USC00063989
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
KVSF,USW00054740
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
电流输出:
#Filters out any values matching K[A-Z]{4}
grep -Po '"\K[A-Z]{4}\b' usc.matched > out.1
#Filters out any values matching US[C,W][0-9]{8}
grep -Po '\bUS\w*' usc.matched > out.2
#Pastes two datasets together, separated by a comma
paste -d',' out.1 out.2 > stations.filtered
#Removes any lines that do not lead with "K"
sed -i '/^[^K]/d' stations.filtered
{"sids": ["94737 1", "RUT 3", "KRUT 5"], "name": "RUTLAND STATE AP"},
{"sids": ["54740 1", "VSF 3", "KVSF 5", "USW00054740 6"], "name": "SPRINGFIELD HARTNESS AP"},
{"sids": ["94601 1", "RKD 3", "KRKD 5"], "name": "ROCKLAND KNOX CO RGNL AP"},
{"sids": ["20B 3"], "name": "ROCKLAND STN"},
{"sids": ["177250 2", "USC00177250 6"], "name": "ROCKLAND"},
{"sids": ["177255 2", "USC00177255 6", "RCKM1 7"], "name": "ROCKLAND"},
{"sids": ["177260 2"], "name": "ROCKLAND MOORING LBS"},
{"sids": [], "name": "ROCKLAND"},
{"sids": ["14612 1"], "name": "ROCKLAND"},
{"sids": ["274380 2", "USC00274380 6"], "name": "KEARSARGE"},
{"sids": ["192770 2", "USC00192770 6"], "name": "FISKDALE"},
{"sids": ["US1CTNL0005 6", "CTNL0005 10"], "name": "OAKDALE 2.6 WNW"},
{"sids": ["063989 2", "USC00063989 6"], "name": "LAKE KONOMOC"},
{"sids": ["14740 1", "14721 1", "063456 2", "069704 2", "BDL 3", "72508 4", "KBDL 5", "USW00014740 6", "BDL 7"], "name": "HARTFORD-BRADLEY INTL AP"},
{"sids": ["94702 1", "060806 2", "BDR 3", "72504 4", "KBDR 5", "USW00094702 6", "BDR 7"], "name": "IGOR I SIKORSKY MEMORI AP"},
{"sids": ["54734 1", "DXR 3", "KDXR 5", "USW00054734 6"], "name": "DANBURY MUNI AP"},
KRUT,
KVSF,USW00054740
KRKD
USC00177250
USC00177255
USC00274380
USC00192770
US1CTNL0005
USC00063989
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
KVSF,USW00054740
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
预期输出:
#Filters out any values matching K[A-Z]{4}
grep -Po '"\K[A-Z]{4}\b' usc.matched > out.1
#Filters out any values matching US[C,W][0-9]{8}
grep -Po '\bUS\w*' usc.matched > out.2
#Pastes two datasets together, separated by a comma
paste -d',' out.1 out.2 > stations.filtered
#Removes any lines that do not lead with "K"
sed -i '/^[^K]/d' stations.filtered
{"sids": ["94737 1", "RUT 3", "KRUT 5"], "name": "RUTLAND STATE AP"},
{"sids": ["54740 1", "VSF 3", "KVSF 5", "USW00054740 6"], "name": "SPRINGFIELD HARTNESS AP"},
{"sids": ["94601 1", "RKD 3", "KRKD 5"], "name": "ROCKLAND KNOX CO RGNL AP"},
{"sids": ["20B 3"], "name": "ROCKLAND STN"},
{"sids": ["177250 2", "USC00177250 6"], "name": "ROCKLAND"},
{"sids": ["177255 2", "USC00177255 6", "RCKM1 7"], "name": "ROCKLAND"},
{"sids": ["177260 2"], "name": "ROCKLAND MOORING LBS"},
{"sids": [], "name": "ROCKLAND"},
{"sids": ["14612 1"], "name": "ROCKLAND"},
{"sids": ["274380 2", "USC00274380 6"], "name": "KEARSARGE"},
{"sids": ["192770 2", "USC00192770 6"], "name": "FISKDALE"},
{"sids": ["US1CTNL0005 6", "CTNL0005 10"], "name": "OAKDALE 2.6 WNW"},
{"sids": ["063989 2", "USC00063989 6"], "name": "LAKE KONOMOC"},
{"sids": ["14740 1", "14721 1", "063456 2", "069704 2", "BDL 3", "72508 4", "KBDL 5", "USW00014740 6", "BDL 7"], "name": "HARTFORD-BRADLEY INTL AP"},
{"sids": ["94702 1", "060806 2", "BDR 3", "72504 4", "KBDR 5", "USW00094702 6", "BDR 7"], "name": "IGOR I SIKORSKY MEMORI AP"},
{"sids": ["54734 1", "DXR 3", "KDXR 5", "USW00054734 6"], "name": "DANBURY MUNI AP"},
KRUT,
KVSF,USW00054740
KRKD
USC00177250
USC00177255
USC00274380
USC00192770
US1CTNL0005
USC00063989
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
KVSF,USW00054740
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
在awk中。根据您的喜好调整正则表达式:
$ awk -v OFS=, '
/K[A-Z]{3} / && /US[C,W][0-9]{8}/ {
b=""
while(match($0,/K[A-Z]{3} |US[C,W][0-9]{8}/)) {
b=b (b==""?"":OFS) substr( $0, RSTART, RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
print b}' file
KVSF ,USW00054740
KBDL ,USW00014740
KBDR ,USW00094702
KDXR ,USW00054734
您可以使用:
awk -F '[][" \t{},:]+' '{
a=b=""
for(i=2; i<=NF; i++)
if ($i ~ /^K[A-Z]{3}$/)
a=$i
else if ($i ~ /^US[CW][0-9]+/)
b=$i
if (a != "" && b != "")
print a, b
}' OFS=, file
KVSF,USW00054740
KBDL,USW00014740
KBDR,USW00094702
KDXR,USW00054734
awk-F'[][”\t{},:]+'{
a=b=“”
对于(i=2;i如果perl
正常(假设K
字符串在US
字符串之前在同一行中)
if/“(K[A-Z]{3})\b.*”(US[CW]\d{8}\b)/
仅当此条件匹配时
打印“$1,$2”
打印两个捕获的组
“(K[A-Z]{3})\b
匹配K
后接三个大写字母,前提是前面加”
并以单词边界结尾
“(US[CW]\d{8}\b
匹配US
,后跟C
或W
,只有在前面加”
并以单词边界结尾时才有八位数字
- 有关
-lne
选项的详细信息,请参阅
是的,我已经安装了jq
。我还附加了当前输出和预期输出,以帮助进一步定义我要做的事情