Java 来自Hadoop的NullPointerException';调用InputSplit';时的s JobSplitWriter/SerializationFactory;s getClass()
当启动Java 来自Hadoop的NullPointerException';调用InputSplit';时的s JobSplitWriter/SerializationFactory;s getClass(),java,hadoop,nullpointerexception,mapreduce,Java,Hadoop,Nullpointerexception,Mapreduce,当启动MapReduce作业时,我得到一个NullPointerException。它是由SerializationFactory的getSerializer()方法抛出的。我正在使用一个自定义的InputSplit,InputFormat,RecordReader和MapReduce值类 我知道这个错误是在我的InputFormat类创建拆分后的一段时间内抛出的,但是在创建RecordReader之前。据我所知,它是在“清理临时区域”消息之后直接发生的 通过在堆栈跟踪指示的位置检查Hadoop
MapReduce
作业时,我得到一个NullPointerException
。它是由SerializationFactory
的getSerializer()
方法抛出的。我正在使用一个自定义的InputSplit
,InputFormat
,RecordReader
和MapReduce
值类
我知道这个错误是在我的InputFormat
类创建拆分后的一段时间内抛出的,但是在创建RecordReader
之前。据我所知,它是在“清理临时区域”消息之后直接发生的
通过在堆栈跟踪指示的位置检查Hadoop源代码,当getSerialization()
接收到nullClass
指针时,似乎发生了错误。JobClient的writeNewSplits()
调用该方法如下:
Serializer<T> serializer = factory.getSerializer((Class<T>) split.getClass());
谢谢
编辑:自定义InputSplit的我的代码如下:
import . . .
/**
* A document directory within the input directory.
* Returned by DirectoryInputFormat.getSplits()
* and passed to DirectoryInputFormat.createRecordReader().
*
* Represents the data to be processed by an individual Map process.
*/
public class DirectorySplit extends InputSplit {
/**
* Constructs a DirectorySplit object
* @param docDirectoryInHDFS The location (in HDFS) of this
* document's directory, complete with all annotations.
* @param fs The filesystem associated with this job
*/
public DirectorySplit( Path docDirectoryInHDFS, FileSystem fs )
throws IOException {
this.inputPath = docDirectoryInHDFS;
hash = FileSystemHandler.getFileNameFromPath(inputPath);
this.fs = fs;
}
/**
* Get the size of the split so that the input splits can be sorted by size.
* Here, we calculate the size to be the number of bytes in the original
* document (i.e., ignoring all annotations).
*
* @return The number of characters in the original document
*/
@Override
public long getLength() throws IOException, InterruptedException {
Path origTxt = new Path( inputPath, "original.txt" );
HadoopInterface.logger.log( msg );
return FileSystemHandler.getFileSizeInBytes( origTxt, fs);
}
/**
* Get the list of nodes where the data for this split would be local.
* This list includes all nodes that contain any of the required data---it's
* up to Hadoop to decide which one to use.
*
* @return An array of the nodes for whom the split is local
* @throws IOException
* @throws InterruptedException
*/
@Override
public String[] getLocations() throws IOException, InterruptedException {
FileStatus status = fs.getFileStatus(inputPath);
BlockLocation[] blockLocs = fs.getFileBlockLocations( status, 0,
status.getLen() );
HashSet<String> allBlockHosts = new HashSet<String>();
for( BlockLocation blockLoc : blockLocs ) {
allBlockHosts.addAll( Arrays.asList( blockLoc.getHosts() ) );
}
return (String[])allBlockHosts.toArray();
}
/**
* @return The hash of the document that this split handles
*/
public String toString() {
return hash;
}
private Path inputPath;
private String hash;
private FileSystem fs;
}
导入。
/**
*输入目录中的文档目录。
*由DirectoryInputFormat.getSplits()返回
*并传递给DirectoryInputFormat.createRecordReader()。
*
*表示要由单个映射进程处理的数据。
*/
公共类DirectorySpilt扩展了InputSplit{
/**
*构造DirectorySpilt对象
*@param docdirectoryindfs此文件的位置(以HDFS为单位)
*文档的目录,包括所有注释。
*@param fs与此作业关联的文件系统
*/
公共目录拆分(路径docdirectoryindfs,文件系统fs)
抛出IOException{
this.inputPath=docdirectoryindfs;
hash=FileSystemHandler.getFileNameFromPath(inputPath);
this.fs=fs;
}
/**
*获取拆分的大小,以便可以按大小对输入拆分进行排序。
*这里,我们将大小计算为原始文件中的字节数
*文档(即忽略所有注释)。
*
*@返回原始文档中的字符数
*/
@凌驾
public long getLength()引发IOException、InterruptedException{
Path origTxt=新路径(inputPath,“original.txt”);
HadoopInterface.logger.log(msg);
返回FileSystemHandler.getFileSizeInBytes(OrigText,fs);
}
/**
*获取此拆分的数据将位于本地的节点列表。
*此列表包括包含任何所需数据的所有节点——它是
*由Hadoop决定使用哪一个。
*
*@返回拆分为本地的节点数组
*@抛出异常
*@抛出中断异常
*/
@凌驾
公共字符串[]getLocations()引发IOException、InterruptedException{
FileStatus status=fs.getFileStatus(inputPath);
BlockLocation[]blockLocs=fs.getFileBlockLocations(状态,0,
status.getLen());
HashSet allBlockHosts=new HashSet();
用于(BlockLocation blockLoc:blockLocs){
allBlockHosts.addAll(Arrays.asList(blockLoc.getHosts());
}
返回(字符串[])allBlockHosts.toArray();
}
/**
*@返回此拆分处理的文档的哈希值
*/
公共字符串toString(){
返回散列;
}
专用路径输入路径;
私有字符串散列;
专用文件系统fs;
}
InputSplit不扩展可写,您需要明确声明您的输入拆分实现可写您可以发布自定义InputSplit的代码吗?它是否扩展可写?我的猜测不是代码加上去的。我假设因为我扩展了实现可写的InputSplit,所以我不需要直接实现可写。不是这样吗?我还要补充一点,您的InputSplit版本必须有一个默认构造函数,Hadoop使用它来实例化该类。
import . . .
/**
* A document directory within the input directory.
* Returned by DirectoryInputFormat.getSplits()
* and passed to DirectoryInputFormat.createRecordReader().
*
* Represents the data to be processed by an individual Map process.
*/
public class DirectorySplit extends InputSplit {
/**
* Constructs a DirectorySplit object
* @param docDirectoryInHDFS The location (in HDFS) of this
* document's directory, complete with all annotations.
* @param fs The filesystem associated with this job
*/
public DirectorySplit( Path docDirectoryInHDFS, FileSystem fs )
throws IOException {
this.inputPath = docDirectoryInHDFS;
hash = FileSystemHandler.getFileNameFromPath(inputPath);
this.fs = fs;
}
/**
* Get the size of the split so that the input splits can be sorted by size.
* Here, we calculate the size to be the number of bytes in the original
* document (i.e., ignoring all annotations).
*
* @return The number of characters in the original document
*/
@Override
public long getLength() throws IOException, InterruptedException {
Path origTxt = new Path( inputPath, "original.txt" );
HadoopInterface.logger.log( msg );
return FileSystemHandler.getFileSizeInBytes( origTxt, fs);
}
/**
* Get the list of nodes where the data for this split would be local.
* This list includes all nodes that contain any of the required data---it's
* up to Hadoop to decide which one to use.
*
* @return An array of the nodes for whom the split is local
* @throws IOException
* @throws InterruptedException
*/
@Override
public String[] getLocations() throws IOException, InterruptedException {
FileStatus status = fs.getFileStatus(inputPath);
BlockLocation[] blockLocs = fs.getFileBlockLocations( status, 0,
status.getLen() );
HashSet<String> allBlockHosts = new HashSet<String>();
for( BlockLocation blockLoc : blockLocs ) {
allBlockHosts.addAll( Arrays.asList( blockLoc.getHosts() ) );
}
return (String[])allBlockHosts.toArray();
}
/**
* @return The hash of the document that this split handles
*/
public String toString() {
return hash;
}
private Path inputPath;
private String hash;
private FileSystem fs;
}