Java Hadoop单节点静默冻结
我有一个MapReduce工具,它在第一个mapper上冻结,没有明显的输出。因为这是单节点安装,所以我无法访问job tracker web界面进行调试。无论输入文件大小如何,我都会得到这种行为。我已经花了整整一天的时间来研究这个问题,现在我准备把我的头发拔出来。输出如下所示:Java Hadoop单节点静默冻结,java,hadoop,mapreduce,Java,Hadoop,Mapreduce,我有一个MapReduce工具,它在第一个mapper上冻结,没有明显的输出。因为这是单节点安装,所以我无法访问job tracker web界面进行调试。无论输入文件大小如何,我都会得到这种行为。我已经花了整整一天的时间来研究这个问题,现在我准备把我的头发拔出来。输出如下所示: 13/09/12 15:12:14 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/09/12 15:12:14 WA RN mapred
13/09/12 15:12:14 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/09/12 15:12:14 WA
RN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/09/12 15:12:14 INFO input.FileInputFormat: Total input paths to process : 1
13/09/12 15:12:14 INFO mapred.JobClient: Running job: job_local1132137425_0001
13/09/12 15:12:14 INFO mapred.LocalJobRunner: Waiting for map tasks
13/09/12 15:12:14 INFO mapred.LocalJobRunner: Starting task: attempt_local1132137425_0001_m_000000_0
13/09/12 15:12:14 INFO util.ProcessTree: setsid exited with exit code 0
13/09/12 15:12:14 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@339c98d3
13/09/12 15:12:14 INFO mapred.MapTask: Processing split: file:/home/axelmagn/EclipseWorkspace/AxelMagnusonCoursework/assign-2/data/in/input.csv:0+33554432
13/09/12 15:12:14 WARN snappy.LoadSnappy: Snappy native library not loaded
13/09/12 15:12:14 INFO mapred.MapTask: io.sort.mb = 100
13/09/12 15:12:14 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/12 15:12:14 INFO mapred.MapTask: record buffer = 262144/327680
13/09/12 15:12:15 INFO mapred.JobClient: map 0% reduce 0%
13/09/12 15:12:15 INFO mapred.MapTask: Starting flush of map output
13/09/12 15:12:15 INFO mapred.MapTask: Starting flush of map output
13/09/12 15:12:20 INFO mapred.LocalJobRunner:
13/09/12 15:12:21 INFO mapred.JobClient: map 20% reduce 0%
然后它就无限期地挂起
工具程序(节略):
工作:
public class VisitorCountJob extends Job {
public static final String TAB = "\t";
public VisitorCountJob(Path inputPath, Path outputPath)
throws IOException {
super();
this.setJarByClass(VisitorCountJob.class);
this.setJobName("Visitor Count");
this.setInputFormatClass(VisitInputFormat.class);
VisitInputFormat.setInputPaths(this, inputPath);
FileOutputFormat.setOutputPath(this, outputPath);
this.setMapperClass(VisitorCountMapper.class);
this.setReducerClass(VisitorCountReducer.class);
this.setOutputKeyClass(Person.class);
this.setOutputValueClass(IntWritable.class);
this.setOutputFormatClass(SequenceFileOutputFormat.class);
}
}
制图员:
public class VisitorCountMapper extends
Mapper<LongWritable, Visit, Person, IntWritable> {
@Override
public void map(LongWritable key, Visit value, Context context)
throws IOException, InterruptedException {
try {
Person visitor = value.getVisitor();
context.write(visitor, new IntWritable(1));
} catch (IOException e) {
e.printStackTrace();
throw e;
} catch (InterruptedException e) {
e.printStackTrace();
throw e;
}
}
}
公共类VisitorCountMapper扩展
制图员{
@凌驾
公共void映射(可长写键、访问值、上下文)
抛出IOException、InterruptedException{
试一试{
Person visitor=value.getVisitor();
write(访问者,新的intwriteable(1));
}捕获(IOE异常){
e、 printStackTrace();
投掷e;
}捕捉(中断异常e){
e、 printStackTrace();
投掷e;
}
}
}
减速器:
public class VisitorCountReducer extends
Reducer<Person, IntWritable, Person, IntWritable> {
@Override
public void reduce(Person visitor, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int count = 0;
for (IntWritable value : values) {
count += value.get();
}
context.write(visitor, new IntWritable(count));
}
}
公共类VisitorCountReducer扩展
减速器{
@凌驾
公共空间减少(访客人数、Iterable值、,
上下文)抛出IOException、InterruptedException{
整数计数=0;
for(可写入值:值){
count+=value.get();
}
write(visitor,newintwriteable(count));
}
}
我还编写了InputFormat和RecordReader来从原始文本生成访问对象,但为了简洁起见,我将省略它们,除非有人认为它们相关
我真是束手无策,所以非常感谢你的帮助
编辑:由于表达了兴趣,以下是我的一些数据类型实现:
人:
public class Person implements WritableComparable<Person> {
public Text firstName;
public Text lastName;
public Person() {}
public Person(Text firstName, Text lastName) {
this.firstName = firstName;
this.lastName = lastName;
}
public Person(String firstName, String lastName) {
this(new Text(firstName), new Text(lastName));
}
public void readFields(DataInput in) throws IOException {
firstName.readFields(in);
lastName.readFields(in);
}
public void write(DataOutput out) throws IOException {
firstName.write(out);
lastName.write(out);
}
public int compareTo(Person other) {
int out;
// give sorting preference to first name
out = firstName.compareTo(other.firstName);
if(out != 0)
return out;
return lastName.compareTo(other.lastName);
}
}
public类Person实现可写性{
公共文本名;
公共文本姓氏;
公众人物(){}
公众人物(文本姓氏、文本姓氏){
this.firstName=firstName;
this.lastName=lastName;
}
公众人物(字符串名、字符串名){
此(新文本(名)、新文本(名));
}
public void readFields(DataInput in)引发IOException{
firstName.readFields(in);
lastName.readFields(在中);
}
public void write(DataOutput out)引发IOException{
名字。写下来;
姓氏。写出;
}
公共int比较(其他人){
指出;
//优先排序第一个名字
out=firstName.compareTo(其他.firstName);
如果(输出!=0)
返回;
返回lastName.compareTo(其他.lastName);
}
}
VisitInputFormat:
public class VisitInputFormat extends FileInputFormat<LongWritable, Visit> {
public RecordReader<LongWritable, Visit> createRecordReader(
InputSplit split, TaskAttemptContext context)
throws IOException, InterruptedException {
VisitRecordReader reader = new VisitRecordReader();
reader.initialize(split, context);
return reader;
}
}
公共类VisitInputFormat扩展了FileInputFormat{
公共记录阅读器createRecordReader(
InputSplit拆分,TaskAttemptContext(上下文)
抛出IOException、InterruptedException{
VisitRecordReader=新建VisitRecordReader();
初始化(拆分,上下文);
返回读取器;
}
}
VisitRecordReader:
public class VisitRecordReader extends RecordReader<LongWritable, Visit> {
private LineRecordReader lineReader;
private LongWritable lineKey;
private Text lineValue;
public VisitRecordReader() {
lineReader = new LineRecordReader();
}
public void initialize(InputSplit genericSplit, TaskAttemptContext context)
throws IOException {
lineReader.initialize(genericSplit, context);
}
public boolean nextKeyValue() throws IOException {
return lineReader.nextKeyValue();
}
public LongWritable getCurrentKey() {
return lineReader.getCurrentKey();
}
public Visit getCurrentValue() {
String raw = lineReader.getCurrentValue().toString();
return new Visit(raw);
}
public float getProgress() throws IOException {
return lineReader.getProgress();
}
public void close() throws IOException {
lineReader.close();
}
}
公共类VisitRecordReader扩展了RecordReader{
专用LineRecordReader lineReader;
私有长写线路密钥;
私有文本行值;
公众访问记录阅读器(){
lineReader=新的LineRecordReader();
}
公共void初始化(InputSplit genericSplit,TaskAttemptContext上下文)
抛出IOException{
lineReader.initialize(genericSplit,context);
}
公共布尔值nextKeyValue()引发IOException{
返回lineReader.nextKeyValue();
}
公共长可写getCurrentKey(){
return lineReader.getCurrentKey();
}
公众访问getCurrentValue(){
字符串原始值=lineReader.getCurrentValue().toString();
回访(raw);
}
公共浮点getProgress()引发IOException{
返回lineReader.getProgress();
}
public void close()引发IOException{
lineReader.close();
}
}
访问:
public class VisitRecordReader extends RecordReader<LongWritable, Visit> {
private LineRecordReader lineReader;
private LongWritable lineKey;
private Text lineValue;
public VisitRecordReader() {
lineReader = new LineRecordReader();
}
public void initialize(InputSplit genericSplit, TaskAttemptContext context)
throws IOException {
lineReader.initialize(genericSplit, context);
}
public boolean nextKeyValue() throws IOException {
return lineReader.nextKeyValue();
}
public LongWritable getCurrentKey() {
return lineReader.getCurrentKey();
}
public Visit getCurrentValue() {
String raw = lineReader.getCurrentValue().toString();
return new Visit(raw);
}
public float getProgress() throws IOException {
return lineReader.getProgress();
}
public void close() throws IOException {
lineReader.close();
}
}
公共类VisitRecordReader扩展了RecordReader{
专用LineRecordReader lineReader;
私有长写线路密钥;
私有文本行值;
公众访问记录阅读器(){
lineReader=新的LineRecordReader();
}
公共void初始化(InputSplit genericSplit,TaskAttemptContext上下文)
抛出IOException{
lineReader.initialize(genericSplit,context);
}
公共布尔值nextKeyValue()引发IOException{
返回lineReader.nextKeyValue();
}
公共长可写getCurrentKey(){
return lineReader.getCurrentKey();
}
公众访问getCurrentValue(){
字符串原始值=lineReader.getCurrentValue().toString();
回访(raw);
}
公共浮点getProgress()引发IOException{
返回lineReader.getProgress();
}
public void close()引发IOException{
lineReader.close();
}
}
person是如何实现的?你的输入格式和记录阅读器也应该很有趣。我用这个信息更新了这篇文章,为什么你需要自己的输入格式呢?只需使用简单的TextInputFormat
并在map方法中创建您的Visit
s。然而,在你的情况下,这不应该是一个问题。您能否运行探查器/调试器来查看它挂起的位置?通常这是一个GC问题,因此您应该会看到大量CPU使用情况或GC活动。主要是因为我对该工具集不熟悉,想尝试编写自己的InputFormat。我还没有完全掌握hadoop评测的艺术,但快速查看顶部显示CPU和Mem的容量都在15%以下。如果我在mapper中实例化访问并取消InputFormat,问题也会继续存在。
public class VisitRecordReader extends RecordReader<LongWritable, Visit> {
private LineRecordReader lineReader;
private LongWritable lineKey;
private Text lineValue;
public VisitRecordReader() {
lineReader = new LineRecordReader();
}
public void initialize(InputSplit genericSplit, TaskAttemptContext context)
throws IOException {
lineReader.initialize(genericSplit, context);
}
public boolean nextKeyValue() throws IOException {
return lineReader.nextKeyValue();
}
public LongWritable getCurrentKey() {
return lineReader.getCurrentKey();
}
public Visit getCurrentValue() {
String raw = lineReader.getCurrentValue().toString();
return new Visit(raw);
}
public float getProgress() throws IOException {
return lineReader.getProgress();
}
public void close() throws IOException {
lineReader.close();
}
}