Hive 通过为包含JSON的列定义模式,为配置单元表创建视图 我将原始JSON字符串从Kafka流存储到HDFS作为拼花地板 我已经在配置单元上为HDFS文件夹创建了一个外部表 现在我想为存储在配置单元表中的原始数据创建一个视图,

Hive 通过为包含JSON的列定义模式,为配置单元表创建视图 我将原始JSON字符串从Kafka流存储到HDFS作为拼花地板 我已经在配置单元上为HDFS文件夹创建了一个外部表 现在我想为存储在配置单元表中的原始数据创建一个视图,,hive,hdfs,avro,parquet,Hive,Hdfs,Avro,Parquet,卡夫卡流到HDFS public static void main(String[] args) throws Exception { String brokers = "quickstart:9092"; String topics = "simple_topic_6"; String master = "local[*]"; SparkSession sparkSession = SparkSession .builder().ap

卡夫卡流到HDFS

public static void main(String[] args) throws Exception {

    String brokers = "quickstart:9092";
    String topics = "simple_topic_6";
    String master = "local[*]";

    SparkSession sparkSession = SparkSession
            .builder().appName(EventKafkaToParquet.class.getName())
            .master(master).getOrCreate();
    SQLContext sqlContext = sparkSession.sqlContext();
    SparkContext context = sparkSession.sparkContext();
    context.setLogLevel("ERROR");

    Dataset<Row> rawDataSet = sparkSession.readStream()
            .format("kafka")
            .option("kafka.bootstrap.servers", brokers)
            .option("subscribe", topics).load();
    rawDataSet.printSchema();

    rawDataSet = rawDataSet.withColumn("employee", rawDataSet.col("value").cast(DataTypes.StringType));
    rawDataSet.createOrReplaceTempView("basicView");
    Dataset<Row> writeDataset = sqlContext.sql("select employee from basicView");
    writeDataset
            .repartition(1)
            .writeStream()
            .option("path","/user/cloudera/employee/")
            .option("checkpointLocation", "/user/cloudera/employee.checkpoint/")
            .format("parquet")
            .trigger(Trigger.ProcessingTime(5000))
            .start()
            .awaitTermination();
}
现在,我想在employee_原始表的顶部创建一个配置单元视图,它将输出作为

firstName, lastName, street, city, state, zip
employee_原始表的输出为

hive> select * from employee_raw;
OK
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
Time taken: 0.123 seconds, Fetched: 5 row(s)

非常感谢您的意见

根据您的描述,我想您主要喜欢这样做,因此您可以在中找到答案

hive> select * from employee_raw;
OK
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
{"employee":{"firstName":"Ganesh","lastName":"Kumar","address":{"street":"1400 Dakota Dr","city":"Princeton","state":"NJ","zip":"09800"}}}
Time taken: 0.123 seconds, Fetched: 5 row(s)