如何在配置单元中导入复杂的json数据

如何在配置单元中导入复杂的json数据,json,dictionary,hive,Json,Dictionary,Hive,在输入中,我有一个要在配置单元上导入的json文件: [ { "code": "ACPBC3P", "libelle": "Bon de commande Prime de satisfaction ACP", "libelleCourt": "Bon de commande Prime de satisfaction ACP", "libelleLong": "Bon de commande Prime de satisf

在输入中,我有一个要在配置单元上导入的json文件:

[
    {
        "code": "ACPBC3P",
        "libelle": "Bon de commande Prime de satisfaction ACP",
        "libelleCourt": "Bon de commande Prime de satisfaction ACP",
        "libelleLong": "Bon de commande Prime de satisfaction ACP",
        "dureeStockage": 24,
        "dureeArchivage": 96,
        "dureeEpuration": 120,
        "dureeStockageReelle": 24,
        "dureeArchivageReelle": 96,
        "dureeEpurationReelle": 120,
        "typologie": {
            "code": "ACP",
            "libelle": "ACP - Activ'projet"
        },
        "sousTypologie": {
            "code": "ACPBC3P",
            "libelle": "BC3P - Bon de commande Prime de satisfaction"
        }
    },
    {
        "code": "ACPC1",
        "libelle": "C1 - Demande d'avoir",
        "libelleCourt": "C1 - Demande d'avoir",
        "libelleLong": "C1 - Demande d'avoir",
        "dureeStockage": 36,
        "dureeArchivage": 84,
        "dureeEpuration": 120,
        "dureeStockageReelle": 36,
        "dureeArchivageReelle": 84,
        "dureeEpurationReelle": 120,
        "typologie": {
            "code": "ACP",
            "libelle": "ACP - Activ'projet"
        },
        "sousTypologie": {
            "code": "ACPC1",
            "libelle": "C1 - Demande d'avoir"
        }
    },
    {
        "code": "ACPC2",
        "libelle": "C2 - Relance fournisseur",
        "libelleCourt": "C2 - Relance fournisseur",
        "libelleLong": "C2 - Relance fournisseur",
        "dureeStockage": 36,
        "dureeArchivage": 84,
        "dureeEpuration": 120,
        "dureeStockageReelle": 36,
        "dureeArchivageReelle": 84,
        "dureeEpurationReelle": 120,
        "typologie": {
            "code": "ACP",
            "libelle": "ACP - Activ'projet"
        },
我尝试用以下复杂类型捕获此信息:

ARRAY <STRUCT <`code`: STRING,` libelle`: STRING, `libelleCourt`: STRING,` libelleLong`: STRING, `storage duration`: INT, `Archive duration` INT, `dureeEpuration`: INT,` dureeStockageReelle`: INT, `dureeArchivageReelle`: INT,` dureeEpurationReelle`: INT, `typologie`: STRUCT <` code` STRING, `libelle` STRING>,` sousTypologie`: STRUCT <`code`: STRING,` libelle`: STRING>, `modeCapture`: STRUCT <` code`: STRING, `libelle`: STRING>,` master`: STRING, `codeActivite`: STRING >> but unfortunately it do not work !

 ARRAY <STRUCT <`code`: STRING,` libelle`: STRING, `libelleCourt`: STRING,` libelleLong`: STRING, `storage duration`: INT, `Archive duration` INT, `dureeEpuration`: INT,` dureeStockageReelle`: INT, `dureeArchivageReelle`: INT,` dureeEpurationReelle`: INT, `typologie`: STRUCT <` code` STRING, `libelle` STRING>,` sousTypologie`: STRUCT <`code`: STRING,` libelle`: STRING>, `modeCapture`: STRUCT <` code`: STRING, `libelle`: STRING>,` master`: STRING, `codeActivite`: STRING >> but unfortunately it do not work !
数组,但不幸的是它不工作!
数组,但不幸的是它不工作!

您没有提到任何有关所面临错误的信息。一般来说,在使用JSON SerDe时需要注意两件事

  • org.apache.hadoop.hive.serde2.JsonSerDe不支持以方括号“[”开头的JSON数据

  • JsonSerDe基于文本SerDe,每个换行符都被视为一条新记录

  • 有效格式:

    {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
    {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
    {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
    {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
    
    [
    {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
    {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
    {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
    {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
    ]
    
      {
        "world_rank": "1",
        "country": "China",
        "population": "1388232694",
        "World": "0.185"
      },
      {
        "world_rank": "2",
        "country": "India",
        "population": "1342512706",
        "World": "0.179"
      },
      {
        "world_rank": "3",
        "country": "U.S.",
        "population": "326474013",
        "World": "0.043"
      },
      {
        "world_rank": "4",
        "country": "Indonesia",
        "population": "263510146",
        "World": "0.035"
      }
    
    CREATE TABLE data (
    code STRING,
    libelle STRING,
    libelleCourt STRING,
    libelleLong STRING,
    dureeStockage INT,
    dureeArchivage INT,
    dureeEpuration INT,
    dureeStockageReelle INT,
    dureeArchivageReelle INT,
    dureeEpurationReelle INT,
    typologie struct<code: STRING, libelle: STRING>,
    sousTypologie struct<code: STRING, libelle: STRING>
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
    STORED AS TEXTFILE;
    
    select soustypologie.code from data;
    select typologie.libelle from data;
    
    格式1无效:

    {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
    {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
    {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
    {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
    
    [
    {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
    {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
    {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
    {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
    ]
    
      {
        "world_rank": "1",
        "country": "China",
        "population": "1388232694",
        "World": "0.185"
      },
      {
        "world_rank": "2",
        "country": "India",
        "population": "1342512706",
        "World": "0.179"
      },
      {
        "world_rank": "3",
        "country": "U.S.",
        "population": "326474013",
        "World": "0.043"
      },
      {
        "world_rank": "4",
        "country": "Indonesia",
        "population": "263510146",
        "World": "0.035"
      }
    
    CREATE TABLE data (
    code STRING,
    libelle STRING,
    libelleCourt STRING,
    libelleLong STRING,
    dureeStockage INT,
    dureeArchivage INT,
    dureeEpuration INT,
    dureeStockageReelle INT,
    dureeArchivageReelle INT,
    dureeEpurationReelle INT,
    typologie struct<code: STRING, libelle: STRING>,
    sousTypologie struct<code: STRING, libelle: STRING>
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
    STORED AS TEXTFILE;
    
    select soustypologie.code from data;
    select typologie.libelle from data;
    
    无效格式2:

    {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
    {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
    {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
    {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
    
    [
    {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
    {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
    {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
    {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
    ]
    
      {
        "world_rank": "1",
        "country": "China",
        "population": "1388232694",
        "World": "0.185"
      },
      {
        "world_rank": "2",
        "country": "India",
        "population": "1342512706",
        "World": "0.179"
      },
      {
        "world_rank": "3",
        "country": "U.S.",
        "population": "326474013",
        "World": "0.043"
      },
      {
        "world_rank": "4",
        "country": "Indonesia",
        "population": "263510146",
        "World": "0.035"
      }
    
    CREATE TABLE data (
    code STRING,
    libelle STRING,
    libelleCourt STRING,
    libelleLong STRING,
    dureeStockage INT,
    dureeArchivage INT,
    dureeEpuration INT,
    dureeStockageReelle INT,
    dureeArchivageReelle INT,
    dureeEpurationReelle INT,
    typologie struct<code: STRING, libelle: STRING>,
    sousTypologie struct<code: STRING, libelle: STRING>
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
    STORED AS TEXTFILE;
    
    select soustypologie.code from data;
    select typologie.libelle from data;
    
    在将输入数据加载到配置单元表之前,应将其预处理为以下格式

    {"code":"ACPBC3P","libelle":"Bon de commande Prime de satisfaction ACP","libelleCourt":"Bon de commande Prime de satisfaction ACP","libelleLong":"Bon de commande Prime de satisfaction ACP","dureeStockage":24,"dureeArchivage":96,"dureeEpuration":120,"dureeStockageReelle":24,"dureeArchivageReelle":96,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"},"sousTypologie":{"code":"ACPBC3P","libelle":"BC3P - Bon de commande Prime de satisfaction"}},
    {"code":"ACPC1","libelle":"C1 - Demande d'avoir","libelleCourt":"C1 - Demande d'avoir","libelleLong":"C1 - Demande d'avoir","dureeStockage":36,"dureeArchivage":84,"dureeEpuration":120,"dureeStockageReelle":36,"dureeArchivageReelle":84,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"},"sousTypologie":{"code":"ACPC1","libelle":"C1 - Demande d'avoir"}}
    {"code":"ACPC2","libelle":"C2 - Relance fournisseur","libelleCourt":"C2 - Relance fournisseur","libelleLong":"C2 - Relance fournisseur","dureeStockage":36,"dureeArchivage":84,"dureeEpuration":120,"dureeStockageReelle":36,"dureeArchivageReelle":84,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"}}
    
    DDL:

    {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
    {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
    {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
    {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
    
    [
    {"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
    {"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
    {"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
    {"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
    ]
    
      {
        "world_rank": "1",
        "country": "China",
        "population": "1388232694",
        "World": "0.185"
      },
      {
        "world_rank": "2",
        "country": "India",
        "population": "1342512706",
        "World": "0.179"
      },
      {
        "world_rank": "3",
        "country": "U.S.",
        "population": "326474013",
        "World": "0.043"
      },
      {
        "world_rank": "4",
        "country": "Indonesia",
        "population": "263510146",
        "World": "0.035"
      }
    
    CREATE TABLE data (
    code STRING,
    libelle STRING,
    libelleCourt STRING,
    libelleLong STRING,
    dureeStockage INT,
    dureeArchivage INT,
    dureeEpuration INT,
    dureeStockageReelle INT,
    dureeArchivageReelle INT,
    dureeEpurationReelle INT,
    typologie struct<code: STRING, libelle: STRING>,
    sousTypologie struct<code: STRING, libelle: STRING>
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
    STORED AS TEXTFILE;
    
    select soustypologie.code from data;
    select typologie.libelle from data;
    

    感谢您的回复,我尝试了您的建议,但他的建议不起作用,错误消息:失败:执行错误,从org.apache.hadoop.hive.ql.exec.ddlstask返回代码1。在我尝试使用此serde:serde“org.openx.data.JsonSerDe.JsonSerDe”后,无法验证serde:org.apache.hadoop.hive.serde2.JsonSerDe安装成功了,但我没有数据