Twitter 用于查看avro记录的配置单元表,该记录是使用flume流式传输的,获取的块大小对此实现无效或太大:-40
我正在创建hive serde外部表以查看使用flume进行流式处理的twitter记录 我的属性文件Twitter 用于查看avro记录的配置单元表,该记录是使用flume流式传输的,获取的块大小对此实现无效或太大:-40,twitter,hive,flume,hive-serde,Twitter,Hive,Flume,Hive Serde,我正在创建hive serde外部表以查看使用flume进行流式处理的twitter记录 我的属性文件 # Naming the components on the current agent. TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS # Describing/Configuring the source TwitterAgent.source
# Naming the components on the current agent.
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Describing/Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx
TwitterAgent.sources.Twitter.accessToken = xxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxx
TwitterAgent.sources.Twitter.keywords = kafka
# Describing/Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://xxx:8000/topics/flumedata
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 100000
TwitterAgent.sinks.hdfs.serializer=Text
# Describing/Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 100000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
TwitterAgent.channels.MemChannel.byteCapacity = 6912212
# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
查询以创建配置单元外部表
CREATE EXTERNAL TABLE twitter_tweets
COMMENT "just drop the schema right into the HQL"
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.literal'='{
"type" : "record",
"name" : "Doc",
"doc" : "adoc",
"fields" : [ {
"name" : "id",
"type" : "string"
}, {
"name" : "user_friends_count",
"type" : [ "int", "null" ]
}, {
"name" : "user_location",
"type" : [ "string", "null" ]
}, {
"name" : "user_description",
"type" : [ "string", "null" ]
}, {
"name" : "user_statuses_count",
"type" : [ "int", "null" ]
}, {
"name" : "user_followers_count",
"type" : [ "int", "null" ]
}, {
"name" : "user_name",
"type" : [ "string", "null" ]
}, {
"name" : "user_screen_name",
"type" : [ "string", "null" ]
}, {
"name" : "created_at",
"type" : [ "string", "null" ]
}, {
"name" : "text",
"type" : [ "string", "null" ]
}, {
"name" : "retweet_count",
"type" : [ "long", "null" ]
}, {
"name" : "retweeted",
"type" : [ "boolean", "null" ]
}, {
"name" : "in_reply_to_user_id",
"type" : [ "long", "null" ]
}, {
"name" : "source",
"type" : [ "string", "null" ]
}, {
"name" : "in_reply_to_status_id",
"type" : [ "long", "null" ]
}, {
"name" : "media_url_https",
"type" : [ "string", "null" ]
}, {
"name" : "expanded_url",
"type" : [ "string", "null" ]
} ]
}');
LOAD DATA INPATH '/topics/flumedata/FlumeData.*' OVERWRITE INTO TABLE twitter_tweets;
创建表后,当我点击select*from twitter_tweets;
它没有给出任何数据,它通过了一个错误
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
我哪里出错了,我不知道为什么我会遇到这个区块大小的问题。谁能给我指点一下。@Rakesh Gupta你有没有面对过上述问题?@Farrukhmuneer你能帮我一下吗