Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/java/401.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 将文本文件从系统读取到Hbase MapReduce_Java_Hadoop_Mapreduce_Hbase - Fatal编程技术网

Java 将文本文件从系统读取到Hbase MapReduce

Java 将文本文件从系统读取到Hbase MapReduce,java,hadoop,mapreduce,hbase,Java,Hadoop,Mapreduce,Hbase,我需要将文本文件中的数据加载到Map Reduce,我已经在网上搜索过了,但是我没有找到适合我工作的解决方案 是否存在从系统读取文本/csv文件并将数据存储到HBASE表中的方法或类 要从文本文件中读取,首先文本文件应为hdfs格式。 您需要为作业指定输入格式和输出格式 Job job = new Job(conf, "example"); FileInputFormat.addInputPath(job, new Path("PATH to text file")); job.setInput

我需要将文本文件中的数据加载到Map Reduce,我已经在网上搜索过了,但是我没有找到适合我工作的解决方案


是否存在从系统读取文本/csv文件并将数据存储到HBASE表中的方法或类

要从文本文件中读取,首先文本文件应为hdfs格式。 您需要为作业指定输入格式和输出格式

Job job = new Job(conf, "example");
FileInputFormat.addInputPath(job, new Path("PATH to text file"));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(YourMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
TableMapReduceUtil.initTableReducerJob("hbase_table_name", YourReducer.class, job);
job.waitForCompletion(true);
YourReducer
应该扩展
org.apache.hadoop.hbase.mapreduce.TableReducer

示例减速机代码

public class YourReducer extends TableReducer<Text, Text, Text> {    
private byte[] rawUpdateColumnFamily = Bytes.toBytes("colName");
/**
* Called once at the beginning of the task.
*/
@Override
protected void setup(Context context) throws IOException, InterruptedException {
// something that need to be done at start of reducer
}

@Override
public void reduce(Text keyin, Iterable<Text> values, Context context) throws IOException, InterruptedException {
// aggregate counts
int valuesCount = 0;
for (Text val : values) {
   valuesCount += 1;
   // put date in table
   Put put = new Put(keyin.toString().getBytes());
   long explicitTimeInMs = new Date().getTime();
   put.add(rawUpdateColumnFamily, Bytes.toBytes("colName"), explicitTimeInMs,val.toString().getBytes());
   context.write(keyin, put);


      }
    }
}
公共类YourReducer扩展了TableReducer{
私有字节[]rawUpdateColumnFamily=Bytes.toBytes(“colName”);
/**
*在任务开始时调用一次。
*/
@凌驾
受保护的无效设置(上下文上下文)引发IOException、InterruptedException{
//在减速机开始时需要做的事情
}
@凌驾
public void reduce(文本键入、Iterable值、上下文上下文)引发IOException、InterruptedException{
//总计数
int valuescont=0;
用于(文本值:值){
valuesCount+=1;
//把日期写进表格
Put Put=新Put(keyin.toString().getBytes());
long explicitTimeInMs=new Date().getTime();
添加(rawUpdateColumnFamily,Bytes.toBytes(“colName”),显式TTIMEINMS,val.toString().getBytes());
context.write(输入、输出);
}
}
}
示例映射器类

public static class YourMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        context.write(word, one);
        }
    }
}
公共静态类YourMapper扩展了Mapper{
私有最终静态IntWritable one=新的IntWritable(1);
私有文本字=新文本();
公共void映射(LongWritable键、文本值、上下文上下文)引发IOException、InterruptedException{
字符串行=value.toString();
StringTokenizer标记器=新的StringTokenizer(行);
while(tokenizer.hasMoreTokens()){
set(tokenizer.nextToken());
上下文。写(单词,一);
}
}
}