使用自定义输出格式从JSON文件中提取键值对

使用自定义输出格式从JSON文件中提取键值对,json,awk,sed,grep,jq,Json,Awk,Sed,Grep,Jq,我想从一个巨大的日志文件中grep两个单词的组合,这些单词是分散的,没有任何特定的顺序 示例日志: {"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0 ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06

我想从一个巨大的日志文件中grep两个单词的组合,这些单词是分散的,没有任何特定的顺序

示例日志:

    {"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logA","i":"IFX","j":"a85","k":"12345678"},
{"1a":"2017-01-28 00:00:11","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logB","i":"IFX","j":"a85","k":"12345678"}
在这个文件中,我想grep
“1a”:“
“h”:“
”,其中不应该有任何重复项

预期产出:

"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"
我尝试以这种方式使用egrep,但它给出了完整的行:

egrep -oE '1a\|"h"' but this does not give the required output.

awk /pattern1/ && /pattern2/ filename #no use

感谢您对救援的帮助

awk

$ awk -F, -v RS={ 'NR>1 {for(i=1;i<=NF;i++)
                         {if($i~/"1a":/) printf "%s", $i OFS
                          if($i~/"h":"log(A|B)"/) printf "%s\n", $i}}' file


"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"
$awk-F,-vrs={'NR>1{for(i=1;i输入

$ cat log
    {"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logA","i":"IFX","j":"a85","k":"12345678"},
{"1a":"2017-01-28 00:00:11","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logB","i":"IFX","j":"a85","k":"12345678"}
$ awk -F, -v RS='[{}]' '{s=""; for(i=1;i<=NF;i++)if($i~/^"(1a|h)":/)s=(s?s OFS:"") $i; if(s)print s}'  log 
"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"
输出

$ cat log
    {"1a":"2017-01-28 00:00:00","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logA","i":"IFX","j":"a85","k":"12345678"},
{"1a":"2017-01-28 00:00:11","2a":"sample","a":"12345","b":"2017-02-06","c":"2017-02-06T17:51:02.454-08:00","d":"Mozilla/5.0
    ; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1","e":"2017-02-06 
    ","f":"03","g":"example","h":"logB","i":"IFX","j":"a85","k":"12345678"}
$ awk -F, -v RS='[{}]' '{s=""; for(i=1;i<=NF;i++)if($i~/^"(1a|h)":/)s=(s?s OFS:"") $i; if(s)print s}'  log 
"1a":"2017-01-28 00:00:00" "h":"logA"
"1a":"2017-01-28 00:00:11" "h":"logB"

<代码> $AWK-F,-V RS=“{{}”{{S=“”;对于(i=1;i

代替标准实用程序,考虑使用非常灵活的<强> JSON CLI < /强>,其中:

  • 简化了解决方案
  • 让它变得健壮
  • 允许它被推广
为了自包含,该命令通过管道提供文字输入,并为可读性而格式化。
要将文件传递给
jq
命令,只需在脚本关闭
后指定其路径即可。

jq-r…“…”file.json

收益率:

"1a": "2017-01-28 00:00:00" "h": "logA"
"1a": "2017-01-28 00:00:11" "h": "logB"
  • --argjson-keys'[“1a”,“h”]
    将变量
    $keys
    定义为要提取的键(属性)名称的JSON格式数组

  • []
    枚举输入数组的所有元素—单个对象—和
    $keys[]
    [$keys[]]
    分别展开为具有索引的属性名
    和该属性名的值(请注意
    […]
    访问器)

  • 大部分工作都花在输出格式上:嵌入的
    字符必须作为
    \”
    转义,并且嵌入的变量引用必须包含在
    \(…)
    中-尽管使用带有单独标记的
    +
    构建字符串也是一个选项

推广解决方案 由于数组索引(
0
1
)是显式指定的,因此上述命令不容易推广到每行输出任意数量的键值对

以下变体受启发,显示了在
jq
中定义辅助函数的一个简单示例,它使用内置函数和自定义函数的组合来接受任意数量的键以提取

echo '
[
  { "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
  { "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
  jq -r --argjson keys '[ "1a", "h", "i"  ]' '
    def printKv($k; $v): "\"\($k)\": \"\($v)\"";
    .[] | . as $o | 
      reduce $keys[] as $k (""; . + if .=="" then "" else " " end + printKv($k; $o[$k]))
  '
echo '
[
  { "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
  { "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
  jq -r --argjson keys '[ "1a", "h", "i"  ]' '
    def printKv($k): "\"\($k)\": \"\(.[$k])\"";
    .[] | [ $keys[] as $k | printKv($k) ] | join(" ")
  '
产生(每行3个键值对,因为传递了3个键):

内置的
reduce
函数用于通过迭代键值对并在自定义函数
printKv
的帮助下为每个键值对创建字符串表示来构建目标字符串


根据的另一个建议,这里有一个更简单、更像
jq
的替代方案,可以产生相同的输出:

echo '
[
  { "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
  { "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
  jq -r --argjson keys '[ "1a", "h", "i"  ]' '
    def printKv($k; $v): "\"\($k)\": \"\($v)\"";
    .[] | . as $o | 
      reduce $keys[] as $k (""; . + if .=="" then "" else " " end + printKv($k; $o[$k]))
  '
echo '
[
  { "1a":"2017-01-28 00:00:00", "2a":"sample", "h":"logA", "i":"IFX" },
  { "1a":"2017-01-28 00:00:11", "2a":"sample", "h":"logB", "i":"IFX" }
]
' |
  jq -r --argjson keys '[ "1a", "h", "i"  ]' '
    def printKv($k): "\"\($k)\": \"\(.[$k])\"";
    .[] | [ $keys[] as $k | printKv($k) ] | join(" ")
  '
  • printKv()
    现在只接受一个参数,即键
    $k
    ,并依赖管道输入(仍然包含输入对象)来提取关联值-
    [$k]

  • […]
    中将
    $keys[]封装为$k | printKv($k)
    将多个
    printKv
    调用的输出作为单个数组通过管道

  • 这样,内置的
    join
    函数就可以将数组元素与空格连接起来,形成一条输出线


这是对@mklement0极好答案的一个调整。通过定义“打印我”函数,该调整将必须避开双引号的烦恼降至最低:

def q: "\"\(tostring)\"";

.[] | "\($keys[0]|q): \(.[$keys[0]]|q) \($keys[1]|q): \(.[$keys[1]]|q)"
或者,如果您愿意:

def printKV($k): "\"\($k)\": \"\(.[$k])\""; 

.[] | printKV($keys[0]) + " " + printKV($keys[1])
广义解 使用上面定义的
printKV/1
,并假设$keys在命令行(或通过其他方式)上定义为字符串数组:

def printKeyValues(keys):
  [keys[] as $key | printKV($key)] | join(" ");

.[] | printKeyValues($keys)

不要使用文本/流处理器/编辑器解析
JSON
,使用适当的解析器,如
jq
将文本输入格式化为适当的
JSON
格式,并安装
jq
此外,正则表达式不是设计模式。标记已删除。感谢您的回答,我如何在他指定了键(这里是1a和h),样本输出如-
“1a”:“2017-01-28 00:00:00”“h”:“logA”{“1a”:“2017-01-28 00:00:00”、“2a”:“sample”、“a”:“12345”、“b”:“2017-02-06”、“c”:“2017-02-06T17:51:02.454-08:00”、“d”:“Mozilla/5.0;en-US;rv:1.9.0.1)Gecko/2008070208/3.0.1”、“e”:“2017-02-06”、“f”:“03”、“g”、“logA”、“Firefox”和“Ifix”::“a85”,“k”:“12345678”}
@user2340345:您只需要像这样
print s,$0
awk-F,-v RS='[{}]'''{s=“”;for(i=1;iThanks for$0)提示,这有助于我利用您的答案和@karafka答案的混合。感谢您的答案,我如何在指定的键(这里是1a和h)之后打印整个json字符串,样本输出如-
“1a”:“2017-01-28 00:00:00”“h”:“logA”{“1a”:“2017-01-28 00:00:00”,“2a”:“样本”,“a”:“12345”,“b”:“2017-02-06”,“c”:“20”‌​17-02-06T17:51:02.45‌​4-08:00,“d”:“Mozill”‌​a/5.0;en-US;rv:1.9.0.1)Gecko/2008070208 Firefox/3.0.1,“e”:“2017-02-06”,“f”:“03”,“g”:“示例”,“h”:“logA”,“i”:“IFX”,“j”:“a85”,“k”:‌​“12345678”}
谢谢@mklement0,感谢您对答案的详细解释:)我们如何通过文件传递上述jq命令,只为这些键生成唯一值?