Json 如何在Druid中格式化TSV文件_Json_Hadoop_Druid

Json 如何在Druid中格式化TSV文件

json hadoop

Json 如何在Druid中格式化TSV文件,json,hadoop,druid,Json,Hadoop,Druid,我正试图用这个摄食斑点在德鲁伊中加载TSV：以下最新规范： { "type" : "index", "spec" :

我正试图用这个摄食斑点在德鲁伊中加载TSV：

以下最新规范：

{                                                                                                                                                                                               
"type" : "index",
"spec" : {
    "ioConfig" : {
        "type" : "index",
        "inputSpec" : {
            "type": "local",
            "baseDir": "quickstart",
            "filter": "test_data.json"
        }
    },
    "dataSchema" : {
        "dataSource" : "local",
        "granularitySpec" : {
            "type" : "uniform",
            "segmentGranularity" : "hour",
            "queryGranularity" : "none",
            "intervals" : ["2016-07-18/2016-07-22"]
        },
        "parser" : {
            "type" : "string",
            "parseSpec" : {
                "format" : "json",
                "dimensionsSpec" : {
                    "dimensions" : ["name", "email", "age"]
                },
                "timestampSpec" : {
                    "format" : "yyyy-MM-dd HH:mm:ss",
                     "column" : "date"
                }
            }
        },
        "metricsSpec" : [
            {
                "name" : "count",
                "type" : "count"
            },
            {
              "type" : "doubleSum",
              "name" : "age",
              "fieldName" : "age"
            }
        ]
    }
}

Schema: name    email    age

name    email    age    Bob    Jones    23    Billy    Jones    45

}

如果我的模式如下所示：

{                                                                                                                                                                                               
"type" : "index",
"spec" : {
    "ioConfig" : {
        "type" : "index",
        "inputSpec" : {
            "type": "local",
            "baseDir": "quickstart",
            "filter": "test_data.json"
        }
    },
    "dataSchema" : {
        "dataSource" : "local",
        "granularitySpec" : {
            "type" : "uniform",
            "segmentGranularity" : "hour",
            "queryGranularity" : "none",
            "intervals" : ["2016-07-18/2016-07-22"]
        },
        "parser" : {
            "type" : "string",
            "parseSpec" : {
                "format" : "json",
                "dimensionsSpec" : {
                    "dimensions" : ["name", "email", "age"]
                },
                "timestampSpec" : {
                    "format" : "yyyy-MM-dd HH:mm:ss",
                     "column" : "date"
                }
            }
        },
        "metricsSpec" : [
            {
                "name" : "count",
                "type" : "count"
            },
            {
              "type" : "doubleSum",
              "name" : "age",
              "fieldName" : "age"
            }
        ]
    }
}

Schema: name    email    age

name    email    age    Bob    Jones    23    Billy    Jones    45

实际数据集如下所示：

{                                                                                                                                                                                               
"type" : "index",
"spec" : {
    "ioConfig" : {
        "type" : "index",
        "inputSpec" : {
            "type": "local",
            "baseDir": "quickstart",
            "filter": "test_data.json"
        }
    },
    "dataSchema" : {
        "dataSource" : "local",
        "granularitySpec" : {
            "type" : "uniform",
            "segmentGranularity" : "hour",
            "queryGranularity" : "none",
            "intervals" : ["2016-07-18/2016-07-22"]
        },
        "parser" : {
            "type" : "string",
            "parseSpec" : {
                "format" : "json",
                "dimensionsSpec" : {
                    "dimensions" : ["name", "email", "age"]
                },
                "timestampSpec" : {
                    "format" : "yyyy-MM-dd HH:mm:ss",
                     "column" : "date"
                }
            }
        },
        "metricsSpec" : [
            {
                "name" : "count",
                "type" : "count"
            },
            {
              "type" : "doubleSum",
              "name" : "age",
              "fieldName" : "age"
            }
        ]
    }
}

Schema: name    email    age

name    email    age    Bob    Jones    23    Billy    Jones    45

在上述TSV数据集中，列的格式是否应为^^？如

name email age

应首先显示（列），然后显示实际数据。我很困惑Druid如何知道如何将列映射到TSV格式的实际数据集。

TSV代表制表符分隔格式，因此它看起来与csv相同，但您将使用制表符而不是逗号，例如

Name<TAB>Age<TAB>Address
Paul<TAB>23<TAB>1115 W Franklin
Bessy the Cow<TAB>5<TAB>Big Farm Way
Zeke<TAB>45<TAB>W Main St

您的等级库文件应如下所示：

{
    "spec" : {
        "ioConfig" : {
            "inputSpec" : {
                "type": "local",
                "baseDir": "path_to_folder",
                "filter": "name_of_the_file(s)"
            }
        },
        "dataSchema" : {
            "dataSource" : "local",
            "granularitySpec" : {
                "type" : "uniform",
                "segmentGranularity" : "hour",
                "queryGranularity" : "none",
                "intervals" : ["2016-07-01/2016-07-28"]
            },
            "parser" : {
                "type" : "string",
                "parseSpec" : {
                    "format" : "tsv",
                    "dimensionsSpec" : {
                        "dimensions" : [
                            "position",
                            "age",
                            "office"
                        ]
                    },
                    "timestampSpec" : {
                        "format" : "auto",
                         "column" : "start_date"
                    }
                }
            },
            "metricsSpec" : [
                {
                    "name" : "count",
                    "type" : "count"
                },
                {
                    "name" : "sum_sallary",
                    "type" : "longSum",
                    "fieldName" : "salary"
                }
            ]
        }
    }
}