Java org.apache.hadoop.io.Text不能强制转换为org.apache.hadoop.io.NullWritable
我想在MapReduce中将序列文件转换为ORC文件。 键/值的输入类型为Text/Text 我的程序看起来像Java org.apache.hadoop.io.Text不能强制转换为org.apache.hadoop.io.NullWritable,java,apache,hadoop,Java,Apache,Hadoop,我想在MapReduce中将序列文件转换为ORC文件。 键/值的输入类型为Text/Text 我的程序看起来像 public class ANR extends Configured implements Tool{ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub int res = ToolRunner.run(new Config
public class ANR extends Configured implements Tool{
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
int res = ToolRunner.run(new Configuration(),new ANR(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Logger log = Logger.getLogger(ANRmap.class.getName());
Configuration conf = getConf();
Job job;
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
conf.set("orc.create.index", "true");
job = Job.getInstance(conf);
/////
job.setJobName("ORC Output");
job.setJarByClass(ANR.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
SequenceFileInputFormat.addInputPath(job, new Path(args[0]));
job.setMapperClass(ANRmap.class);
job.setNumReduceTasks(0);
job.setOutputFormatClass(OrcNewOutputFormat.class);
OrcNewOutputFormat.setCompressOutput(job,true);
OrcNewOutputFormat.setOutputPath(job,new Path(args[1]));
return job.waitForCompletion(true) ? 0: 1;
}
制图员
public class ANRmap extends Mapper<Text,Text,NullWritable,Writable> {
private final OrcSerde serde = new OrcSerde();
public void map(Text key, Text value,
OutputCollector<NullWritable, Writable> output)
throws IOException {
output.collect(NullWritable.get(),serde.serialize(value, null));
}
}
OrcNewOutputFormat中的输出键为空可写。如何将文本转换为NullWritable或以另一种方式修复此异常?尝试使用
上下文
而不是输出收集器
public class ReduceTask extends Reducer<Text,Text, Text, NullWritable>{
public void reduce(Text key,Iterable<Text> values,Context context){
for(Text value:values){
try {
context.write(key,NullWritable.get());
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
公共类ReduceTask扩展Reducer{
公共void reduce(文本键、Iterable值、上下文){
用于(文本值:值){
试一试{
write(key,nullwriteable.get());
}捕获(IOE异常){
e、 printStackTrace();
}捕捉(中断异常e){
e、 printStackTrace();
}
}
}
}
谢谢@Aman,这很有帮助,但我对ORCSerde、context.write(nullwriteable.get()、serde.serialize(value,null))有了新的问题;如何从值中表示ORCSerde行?我不知道应该使用哪个结构,然后从序列文件转换。
public class ReduceTask extends Reducer<Text,Text, Text, NullWritable>{
public void reduce(Text key,Iterable<Text> values,Context context){
for(Text value:values){
try {
context.write(key,NullWritable.get());
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}