Hadoop 基于pig/hive的半结构化数据处理_Hadoop_Hive_Apache Pig - Fatal编程技术网

Hadoop 基于pig/hive的半结构化数据处理

hadoop hive apache-pig

Hadoop 基于pig/hive的半结构化数据处理,hadoop,hive,apache-pig,Hadoop,Hive,Apache Pig,我有如下半结构化数据： col1 col2 col3 col4 1 2 3 [name#aa, address#[perminentaddress#abc,currentaddress#xyg]] 5 9 8 [address#[perminentaddress#dev,currentaddress#pqr],name#bb] 3 4 9 [name#cc,mobile#111,id#66 address#[perminentaddress#

我有如下半结构化数据：

col1 col2 col3 col4
1    2    3    [name#aa, address#[perminentaddress#abc,currentaddress#xyg]]
5    9    8    [address#[perminentaddress#dev,currentaddress#pqr],name#bb]
3    4    9    [name#cc,mobile#111,id#66 address#[perminentaddress#abc,currentaddress#xyg]]

前三列是固定的，第四列可以有任何具有键值对的未知数据。键值对可以嵌套，如上面的示例所示。最重要的是，第四列中的键位置不是固定的，可以有无限多个键

是否可以使用pig/hive处理这些数据

例如，如何从上述所有行获取currentaddress值？（请注意，键位置不固定，地址键有嵌套键）

谢谢。

您可以使用嵌套的数据映射来表示第4列，请参见
然后您将能够访问
currentaddress
作为
col4#'address'#'currentaddress'

要以这种方式表示您的数据，您可能需要编写一个。
是的，可以用pig进行处理，您没有提出非常具体的问题。我已将上述问题编辑为更具体的问题。

[hive]相关文章推荐

Hive 如何在EMR配置单元中映射动态dynamoDB列 hive amazon-dynamodb

Hive “设置”的好处是什么；“蜂巢执行并行”；在蜂房里做假？ hive

Hive 我可以更改配置单元的默认行分隔符吗？ hive

Hive Spark SQL可以在没有任何Map/Reduce（/warn）运行的情况下针对配置单元表执行吗？ hive apache-spark

Hive 无效的拼花配置单元架构：重复的组数组 hive

Hive中是否有类似于Oracle中解码的功能？ hive

Hive 错误2998:未处理的内部错误。找到接口org.apache.hadoop.mapreduce.JobContext，但应为类 hive apache-pig

Hive 使用CDH 5.4和Spark 1.3.0的PySpark中的拼花地板错误，带有蜂巢中的拼花桌 hive pyspark

Hive 将本地数据加载到配置单元数据库面临的问题 hive

Hive 配置单元强制转换为BIGINT返回null hive

Hive 选择分区列 hive

Hive 如何在配置单元中组合多个贴图？ hive

Hive 蜂巢相当于LISTAGG hive

Hive 左连接执行时间较长 hive

Hive 如何禁用配置单元表的事务？ hive

Hive 如何读取和分离配置单元表列中的非ascii字符 hive

Hive kylin启动期间报告了一个错误 hive

Hive 蜂巢计数不同UDAF hive

Hive 如何使用CTA和配置单元中的位置创建与另一个表相同的表？ hive

Hive 编译语句时出错：失败：ParseException行6:27无法识别'&书信电报；EOF>''&书信电报；EOF>''&书信电报；EOF>'； hive teradata

随机文章推荐

Compiler construction 语法与语义分析 compiler-construction

Compiler construction 前瞻集的精确定义是什么？ compiler-construction

Compiler construction Fsyacc的示例语法出错？ compiler-construction f#

Compiler construction 我想能够从学校用拇指驱动器编译和编程，什么是最好的语言？ compiler-construction usb

Compiler construction 无法编译LIBNETFILTER\u队列 compiler-construction

Compiler construction 简单求和乘法语法的反向求导问题 compiler-construction

Compiler construction 为什么不总是使用编译器优化？ compiler-construction

Compiler construction Turbo Pascal 3.01A turboh错误：“1”；协处理器卡无响应“； compiler-construction compiler-errors

Compiler construction 词法生成器如何识别语法的关键字？ compiler-construction

Compiler construction CoffeeScript.compile（）接受参数列表 compiler-construction coffeescript

Compiler construction 字节码编译器中计算跳转地址的智能解决方案？ compiler-construction

Compiler construction 哪些语言是上下文敏感的？ compiler-construction

Compiler construction 编译结构 compiler-construction compilation compiler-errors

Compiler construction 为什么没有链接此共享库？ compiler-construction cuda compilation linker

Compiler construction 野牛在代码中向前看 compiler-construction bison

Compiler construction ANTLR歧义问题 compiler-construction antlr

Compiler construction “弹性误差”；“缓冲区末端缺失”；在EOF时，当我使用yymore（）时 compiler-construction

Compiler construction 使用'lombok'注释和JavaJDK8在内存中编译Java类 compiler-construction java-8

Compiler construction 编译器和解释器是不同的，但是它们所执行的角色有区别吗？ compiler-construction

Compiler construction 常量静态字段一旦'；重新评估？（优化） compiler-construction

[hadoop]相关推荐

客户端计算机上的hadoop api配置
Hadoop

如何在Hadoop上运行配置单元查询时停止特定作业？脚本：查询：错误如下：
Hadoop Hive

Hadoop Amazon Elastic MapReduce引导操作不起作用
Hadoop Amazon Web Services Mapreduce

Hadoop 使用配置单元自定义输出格式处理日志文件
Hadoop Hive

Hadoop ApachePig对输入数据大小有任何限制吗？
Hadoop Mapreduce Apache Pig

Hadoop 在弹性MapReduce（MySQL&x2B；MongoDB输入）上拆分映射者的职责
Hadoop Mapreduce

Hadoop Cloudera群集安装过程中出错？
Hadoop Cluster Computing Apache Pig

如何更改地址'；hadoop jar'；命令正在连接到？
Hadoop Jar Dns Mapreduce

Hadoop Cassandra nodetool snapshot创建快照目录，但不在指定的数据目录中
Hadoop Cassandra

Hadoop 配置单元CLI无法从另一个表创建表
Hadoop Hive

Pig 0.7.0错误2118:无法在Hadoop 1.2.1上创建输入拆分
Hadoop Apache Pig

Hadoop 如何从hdfs路径剥离主机信息
Hadoop

Hadoop 查找HDFS正在侦听的端口号
Hadoop

Hadoop Hive显示的行数少于HBase
Hadoop Amazon Web Services Hive Hbase

Hadoop 独立Spark群集上的网关服务器错误
Hadoop Apache Spark

不同块大小的Hadoop
Hadoop

SPARK:无法实例化org.apache.hadoop.hive.metastore.HiveMetaStoreClient
Hadoop Hive Apache Spark

在hadoop多节点群集上启动HDFS守护程序时出错。Datanode未启动
Hadoop

Hadoop 如何将图像文件从HDFS目录移动到HBase？
Hadoop Hbase

Hadoop 在格式化namenode之前，Java环境变量不可用
Hadoop

Hadoop 我们可以通过多列组合拆分Sqoop作业吗
Hadoop

Hadoop map reduce-仅在运行所有映射后才将映射器的输出发送到reduce
Hadoop Mapreduce

Hadoop 通过JDBC集成Spark SQL和Apache Drill
Hadoop Jdbc Apache Spark

Hadoop 蜂巢生成ID
Hadoop Hive

Hadoop 使用WAL接收器的Spark Kafka积分
Hadoop Apache Spark Apache Kafka

Hadoop 在impala中仅显示列名的查询
Hadoop Hive

使用Hadoop 2.7.4资源问题配置纱线
Hadoop

Hadoop 如何在Pig中合并地图
Hadoop Merge Apache Pig

使用flink-s3-fs-hadoop时出现不满意的链接错误
Hadoop Amazon S3 Apache Flink

Hadoop HDF上的块多久复制一次？
Hadoop

Tags

Compiler Construction Computer Vision Asynchronous Microsoft Graph Api Wix Server Verilog Artifactory Binary Lambda Service Jasmine Google Bigquery Aurelia Cryptography Ember.js Powerbi Rss Jqgrid Xcode4 Ibm Mobilefirst Datetime Sms Tomcat Orm Ssas Visual Studio 2017 Sql Symfony1 Dask Mips Testing Yii2 Architecture Windows Phone 7 Jaxb Vaadin Netlogo Material Ui Phpmyadmin Asp.net Mvc 4 Wcf Ibm Cloud Video Pascal Android Entity Framework Core Ipython Docker Compose Dependencies Jms Google Chrome Devtools Arrays Xamarin.forms Jdbc Pdf Model Seo Loops Post File Io Paypal If Statement Keyboard Activerecord Opengl Es Ruby On Rails 3.1 Vbscript Drupal Office Js Kernel Drupal 7 Gulp Combobox Command Line Yaml Utf 8 Domain Driven Design Dart Netty Google Cloud Dataflow Nginx Scheme Serialization Image Processing Ssrs 2008 Download Plsql Xamarin Serial Port Internet Explorer Windows 7 Data Binding Log4j Jakarta Ee Com Network Programming Dns Windows Phone 8.1 Docusignapi Opencv Python 3.x Less Automation Gtk Io Algorithm Opencl Rxjs Error Handling Compilation Certificate Open Source Jar Google Maps Api 3 Kibana Robotframework Actions On Google Clojure Speech Recognition Graphql Centos Mercurial Windows Phone Laravel Google Compute Engine Ckeditor Windows 10 Matrix Gmail Ssis Debugging Asp.net Core .net 4.0 Scikit Learn Vue.js Hadoop Maven 2 Swift3 Python Sphinx Amazon Cloudformation Coffeescript Forms Mapping Synchronization Here Api Deep Learning Azure Sql Database Jhipster Sass Flutter Discord.js Subsonic Sorting Reporting Services File Cakephp Abap Twitter Winapi Web Applications Process Rest Titanium Wxpython Curl Microservices Big O Twig Amazon Ec2 Azure Cocoa Asp Classic Signalr Drop Down Menu Pentaho Installation Exchange Server Xaml Winforms Monitoring Ipad Ignite Google Calendar Api Windows Runtime Mobile Properties Devexpress Exception Menu Glassfish Couchbase Api Quickbooks Orientdb Unity3d Asp.net Ms Office Apache2 Project Management Google Sheets

Copyright © 2024. All Rights Reserved by - Fatal编程技术网