使用自定义输出格式从JSON文件中提取键值对
我想从一个巨大的日志文件中grep两个单词的组合,这些单词是分散的,没有任何特定的顺序 示例日志:使用自定义输出格式从JSON文件中提取键值对,json,awk,sed,grep,jq,Json,Awk,Sed,Grep,Jq,我想从一个巨大的日志文件中grep两个单词的组合,这些单词是分散的,没有任何特定的顺序 示例日志: {"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0 ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06
{"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06
","f":"03","g":"example","h":"logA","i":"IFX","j":"a85","k":"12345678"},
{"1a":"2017-01-28 00:00:11","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06
","f":"03","g":"example","h":"logB","i":"IFX","j":"a85","k":"12345678"}
在这个文件中,我想grep“1a”:“
和“h”:“
”,其中不应该有任何重复项
预期产出:
"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"
我尝试以这种方式使用egrep,但它给出了完整的行:
egrep -oE '1a\|"h"' but this does not give the required output.
awk /pattern1/ && /pattern2/ filename #no use
感谢您对救援的帮助
awk
$ awk -F, -v RS={ 'NR>1 {for(i=1;i<=NF;i++)
{if($i~/"1a":/) printf "%s", $i OFS
if($i~/"h":"log(A|B)"/) printf "%s\n", $i}}' file
"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"
$awk-F,-vrs={'NR>1{for(i=1;i输入
$ cat log
{"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06
","f":"03","g":"example","h":"logA","i":"IFX","j":"a85","k":"12345678"},
{"1a":"2017-01-28 00:00:11","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06
","f":"03","g":"example","h":"logB","i":"IFX","j":"a85","k":"12345678"}
$ awk -F, -v RS='[{}]' '{s=""; for(i=1;i<=NF;i++)if($i~/^"(1a|h)":/)s=(s?s OFS:"") $i; if(s)print s}' log
"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"
输出
$ cat log
{"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06
","f":"03","g":"example","h":"logA","i":"IFX","j":"a85","k":"12345678"},
{"1a":"2017-01-28 00:00:11","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06
","f":"03","g":"example","h":"logB","i":"IFX","j":"a85","k":"12345678"}
$ awk -F, -v RS='[{}]' '{s=""; for(i=1;i<=NF;i++)if($i~/^"(1a|h)":/)s=(s?s OFS:"") $i; if(s)print s}' log
"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"
<代码> $AWK-F,-V RS=“{{}”{{S=“”;对于(i=1;i代替标准实用程序,考虑使用非常灵活的<强> JSON CLI < /强>,其中:
- 简化了解决方案
- 让它变得健壮
- 允许它被推广
为了自包含,该命令通过管道提供文字输入,并为可读性而格式化。
要将文件传递给jq
命令,只需在脚本关闭后指定其路径即可。
(jq-r…“…”file.json
)
收益率:
"1a": "2017-01-28 00:00:00" "h": "logA"
"1a": "2017-01-28 00:00:11" "h": "logB"
--argjson-keys'[“1a”,“h”]
将变量$keys
定义为要提取的键(属性)名称的JSON格式数组
[]
枚举输入数组的所有元素—单个对象—和$keys[]
和[$keys[]]
分别展开为具有索引的属性名
和该属性名的值(请注意[…]
访问器)
- 大部分工作都花在输出格式上:嵌入的
“
字符必须作为\”
转义,并且嵌入的变量引用必须包含在\(…)
中-尽管使用带有单独标记的+
构建字符串也是一个选项
推广解决方案
由于数组索引(0
和1
)是显式指定的,因此上述命令不容易推广到每行输出任意数量的键值对
以下变体受启发,显示了在jq
中定义辅助函数的一个简单示例,它使用内置函数和自定义函数的组合来接受任意数量的键以提取:
echo '
[
{ "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
{ "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
jq -r --argjson keys '[ "1a", "h", "i" ]' '
def printKv($k; $v): "\"\($k)\": \"\($v)\"";
.[] | . as $o |
reduce $keys[] as $k (""; . + if .=="" then "" else " " end + printKv($k; $o[$k]))
'
echo '
[
{ "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
{ "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
jq -r --argjson keys '[ "1a", "h", "i" ]' '
def printKv($k): "\"\($k)\": \"\(.[$k])\"";
.[] | [ $keys[] as $k | printKv($k) ] | join(" ")
'
产生(每行3个键值对,因为传递了3个键):
内置的reduce
函数用于通过迭代键值对并在自定义函数printKv
的帮助下为每个键值对创建字符串表示来构建目标字符串
根据的另一个建议,这里有一个更简单、更像jq
的替代方案,可以产生相同的输出:
echo '
[
{ "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
{ "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
jq -r --argjson keys '[ "1a", "h", "i" ]' '
def printKv($k; $v): "\"\($k)\": \"\($v)\"";
.[] | . as $o |
reduce $keys[] as $k (""; . + if .=="" then "" else " " end + printKv($k; $o[$k]))
'
echo '
[
{ "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
{ "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
jq -r --argjson keys '[ "1a", "h", "i" ]' '
def printKv($k): "\"\($k)\": \"\(.[$k])\"";
.[] | [ $keys[] as $k | printKv($k) ] | join(" ")
'
printKv()
现在只接受一个参数,即键$k
,并依赖管道输入(仍然包含输入对象)来提取关联值-[$k]
- 在
[…]
中将$keys[]封装为$k | printKv($k)
将多个printKv
调用的输出作为单个数组通过管道
- 这样,内置的
join
函数就可以将数组元素与空格连接起来,形成一条输出线
这是对@mklement0极好答案的一个调整。通过定义“打印我”函数,该调整将必须避开双引号的烦恼降至最低:
def q: "\"\(tostring)\"";
.[] | "\($keys[0]|q): \(.[$keys[0]]|q) \($keys[1]|q): \(.[$keys[1]]|q)"
或者,如果您愿意:
def printKV($k): "\"\($k)\": \"\(.[$k])\"";
.[] | printKV($keys[0]) + " " + printKV($keys[1])
广义解
使用上面定义的printKV/1
,并假设$keys在命令行(或通过其他方式)上定义为字符串数组:
def printKeyValues(keys):
[keys[] as $key | printKV($key)] | join(" ");
.[] | printKeyValues($keys)
不要使用文本/流处理器/编辑器解析JSON
,使用适当的解析器,如jq
将文本输入格式化为适当的JSON
格式,并安装jq
此外,正则表达式不是设计模式。标记已删除。感谢您的回答,我如何在他指定了键(这里是1a和h),样本输出如-“1a”:“2017-01-28 00:00:00”“h”:“logA”{“1a”:“2017-01-28 00:00:00”、“2a”:“sample”、“a”:“12345”、“b”:“2017-02-06”、“c”:“2017-02-06T17:51:02.454-08:00”、“d”:“Mozilla/5.0;en-US;rv:1.9.0.1)Gecko/2008070208/3.0.1”、“e”:“2017-02-06”、“f”:“03”、“g”、“logA”、“Firefox”和“Ifix”::“a85”,“k”:“12345678”}
@user2340345:您只需要像这样print s,$0
,awk-F,-v RS='[{}]'''{s=“”;for(i=1;iThanks for$0)提示,这有助于我利用您的答案和@karafka答案的混合。感谢您的答案,我如何在指定的键(这里是1a和h)之后打印整个json字符串,样本输出如-“1a”:“2017-01-28 00:00:00”“h”:“logA”{“1a”:“2017-01-28 00:00:00”,“2a”:“样本”,“a”:“12345”,“b”:“2017-02-06”,“c”:“20”17-02-06T17:51:02.454-08:00,“d”:“Mozill”a/5.0;en-US;rv:1.9.0.1)Gecko/2008070208 Firefox/3.0.1,“e”:“2017-02-06”,“f”:“03”,“g”:“示例”,“h”:“logA”,“i”:“IFX”,“j”:“a85”,“k”:“12345678”}
谢谢@mklement0,感谢您对答案的详细解释:)我们如何通过文件传递上述jq命令,只为这些键生成唯一值?