Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/clojure/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 如何使用Flume将CSV(逗号分隔)文件加载到HBase表中?_Hadoop_Hbase_Flume_Flume Ng_Flume Twitter - Fatal编程技术网

Hadoop 如何使用Flume将CSV(逗号分隔)文件加载到HBase表中?

Hadoop 如何使用Flume将CSV(逗号分隔)文件加载到HBase表中?,hadoop,hbase,flume,flume-ng,flume-twitter,Hadoop,Hbase,Flume,Flume Ng,Flume Twitter,我想将CSV(逗号分隔)文件加载到我的Hbase表中。我已经在一些谷歌文章的帮助下尝试过了,现在我可以将整行(或行)作为值加载到Hbase中,也就是说,单行中的所有值都存储为单列,但我想根据分隔符逗号(,)拆分行,并将这些值存储到Hbase表的列族中的不同列中 请帮我解决这个问题。如有任何建议,我们将不胜感激 下面是我目前使用的输入文件、代理配置文件和hbase输出文件 1)input file 8600000US00601,00601,006015-DigitZCTA,0063-DigitZ

我想将CSV(逗号分隔)文件加载到我的Hbase表中。我已经在一些谷歌文章的帮助下尝试过了,现在我可以将整行(或行)作为值加载到Hbase中,也就是说,单行中的所有值都存储为单列,但我想根据分隔符逗号(,)拆分行,并将这些值存储到Hbase表的列族中的不同列中

请帮我解决这个问题。如有任何建议,我们将不胜感激

下面是我目前使用的输入文件、代理配置文件和hbase输出文件

1)input file

8600000US00601,00601,006015-DigitZCTA,0063-DigitZCTA,11102
8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869
8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423
8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548
8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603

2)agent configuration file

agent.sources  = spool
agent.channels = fileChannel2
agent.sinks    = sink2

agent.sources.spool.type = spooldir
agent.sources.spool.spoolDir = /home/cloudera/Desktop/flume
agent.sources.spool.fileSuffix = .completed
agent.sources.spool.channels = fileChannel2
#agent.sources.spool.deletePolicy = immediate

agent.sinks.sink2.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.sink2.channel = fileChannel2
agent.sinks.sink2.table = sample
agent.sinks.sink2.columnFamily = s1
agent.sinks.sink2.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.sink1.serializer.regex = "\"([^\"]+)\""
agent.sinks.sink2.serializer.regexIgnoreCase = true
agent.sinks.sink1.serializer.colNames =col1,col2,col3,col4,col5
agent.sinks.sink2.batchSize = 100
agent.channels.fileChannel2.type=memory

3)HBase output 

hbase(main):009:0> scan 'sample'
ROW                                         COLUMN+CELL                                                                                                                 
 1431064328720-0LalKGmSf3-1                 column=s1:payload, timestamp=1431064335428, value=8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869                
 1431064328720-0LalKGmSf3-2                 column=s1:payload, timestamp=1431064335428, value=8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423                
 1431064328720-0LalKGmSf3-3                 column=s1:payload, timestamp=1431064335428, value=8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548                
 1431064328721-0LalKGmSf3-4                 column=s1:payload, timestamp=1431064335428, value=8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603                
4 row(s) in 0.0570 seconds

hbase(main):010:0> 

我也有同样的问题,知道吗?