HIVE SerDe的数据问题

HIVE SerDe的数据问题,hive,Hive,目标:-使用SerDe功能解析日志数据并将其加载到配置单元中。使用SELECT语句检索数据时遇到问题 我们创建了一个表,并且能够成功地加载数据。但是,select语句只检索空值 示例日志数据: 2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5729677439273359430_1495 add jar /u

目标:-使用SerDe功能解析日志数据并将其加载到配置单元中。使用SELECT语句检索数据时遇到问题

我们创建了一个表,并且能够成功地加载数据。但是,select语句只检索空值

示例日志数据:

2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5729677439273359430_1495
add jar /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u4.jar;
load data local inpath "/tmp/logdata.txt" into table log;
Select * from log LIMIT 1;
NULL NULL NULL NULL NULL
我们用来解析上述日志的正则表达式是:

([^ ]*) ([^ ]{8})[^ ]* ([A-Z]*) ([^ ]*): ([[^ ]*\s]*)
创建表格

CREATE EXTERNAL TABLE log (
dt STRING,
time STRING,
loglevel STRING,
check STRING,
status STRING )
ROW FORMAT
SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex"([^ ]*) ([^ ]{8})[^ ]* ([A-Z]*) ([^ ]*): ([[^ ]*\s]*)",
"output.format.string"="%1$s %2$s %3$s %4$s %5$s")
STORED AS TEXTFILE LOCATION '/tmp/log/';
我们添加了jar:

2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5729677439273359430_1495
add jar /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u4.jar;
load data local inpath "/tmp/logdata.txt" into table log;
Select * from log LIMIT 1;
NULL NULL NULL NULL NULL
加载数据:

2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5729677439273359430_1495
add jar /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u4.jar;
load data local inpath "/tmp/logdata.txt" into table log;
Select * from log LIMIT 1;
NULL NULL NULL NULL NULL
检索数据:

2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5729677439273359430_1495
add jar /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u4.jar;
load data local inpath "/tmp/logdata.txt" into table log;
Select * from log LIMIT 1;
NULL NULL NULL NULL NULL
输出:

2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_5729677439273359430_1495
add jar /usr/lib/hive/lib/hive-contrib-0.7.1-cdh3u4.jar;
load data local inpath "/tmp/logdata.txt" into table log;
Select * from log LIMIT 1;
NULL NULL NULL NULL NULL
日志数据示例:

2013-02-21 00:13:48,916 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Verification succeeded for blk_5729677439273359430_1495

2013-02-21 00:15:39,929 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner:
Verification succeeded for blk_-4787916211671845946_1464
提前谢谢

请试试这个:

看起来您应该在“input.regex”之后添加“=”

通常,这种错误是由正则表达式与输入不完全匹配引起的