Hive 使用水槽的蜂巢水槽时,蜂巢中的记录不完整
我想使用Flume将数据收集到Hive数据库。 我已将数据存储在蜂箱中,但数据不完整。 我想按如下方式插入记录:Hive 使用水槽的蜂巢水槽时,蜂巢中的记录不完整,hive,flume,Hive,Flume,我想使用Flume将数据收集到Hive数据库。 我已将数据存储在蜂箱中,但数据不完整。 我想按如下方式插入记录: 1201,Gopal 1202,Manisha 1203,Masthanvali 1204,Kiran 1205,Kranthi 当我运行水槽时,HDFS/user/hive/warehouse/test2.db/employee12/delta_0000501_0000600中有bucket_00000和bucket_00000_f
1201,Gopal
1202,Manisha
1203,Masthanvali
1204,Kiran
1205,Kranthi
当我运行水槽时,HDFS/user/hive/warehouse/test2.db/employee12/delta_0000501_0000600中有bucket_00000和bucket_00000_flush_长度。数据库是test2,表名是employee12
当我使用select*from employee12时,它显示如下:
--------------------------------------------------------------------
hive> select * from employee12;
OK
(two line next)
1201 Gopal
1202
Time taken: 0.802 seconds, Fetched: 1 row(s)
----------------------------------------------------------------------
谁能帮我找到:
为什么它只显示两行?
为什么第二行只列出1202?
如何设置正确的配置
水槽配置:
配置单元创建表语句:
尝试使用外部表。我在处理类似设置时发现了这篇文章。尝试使用外部表。我在处理类似设置时发现了这篇文章。有人有相同的问题吗?有人有相同的问题吗?
agenthive.sources = spooldirSource
agenthive.channels = memoryChannel
agenthive.sinks = hiveSink
agenthive.sources.spooldirSource.type=spooldir
agenthive.sources.spooldirSource.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agenthive.sources.spooldirSource.spoolDir=/home/flume/flume_test_home/spooldir
agenthive.sources.spooldirSource.channels=memoryChannel
agenthive.sources.spooldirSource.basenameHeader=true
agenthive.sources.spooldirSource.basenameHeaderKey=basename
agenthive.sinks.hiveSink.type=hive
agenthive.sinks.hiveSink.hive.metastore = thrift://127.0.0.1:9083
agenthive.sinks.hiveSink.hive.database = test2
agenthive.sinks.hiveSink.hive.table = employee12
agenthive.sinks.hiveSink.round = true
agenthive.sinks.hiveSink.roundValue = 10
agenthive.sinks.hiveSink.roundUnit = second
agenthive.sinks.hiveSink.serializer = DELIMITED
agenthive.sinks.hiveSink.serializer.delimiter = ","
agenthive.sinks.hiveSink.serializer.serdeSeparator = ','
agenthive.sinks.hiveSink.serializer.fieldnames =eid,name
agenthive.sinks.hiveSink.channel=memoryChannel
agenthive.channels.memoryChannel.type=memory
agenthive.channels.memoryChannel.capacity=100
create table if not exists employee12 (eid int,name string)
comment 'this is comment'
clustered by(eid) into 1 buckets
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as orc
tblproperties('transactional'='true');