Java 来自Hadoop的NullPointerException'；调用InputSplit'；时的s JobSplitWriter/SerializationFactory；s getClass（）_Java_Hadoop_Nullpointerexception_Mapreduce

Java 来自Hadoop的NullPointerException'；调用InputSplit'；时的s JobSplitWriter/SerializationFactory；s getClass（）

java hadoop mapreduce

Java 来自Hadoop的NullPointerException'；调用InputSplit'；时的s JobSplitWriter/SerializationFactory；s getClass（）,java,hadoop,nullpointerexception,mapreduce,Java,Hadoop,Nullpointerexception,Mapreduce,当启动MapReduce作业时，我得到一个NullPointerException。它是由SerializationFactory的getSerializer（）方法抛出的。我正在使用一个自定义的InputSplit，InputFormat，RecordReader和MapReduce值类我知道这个错误是在我的InputFormat类创建拆分后的一段时间内抛出的，但是在创建RecordReader之前。据我所知，它是在“清理临时区域”消息之后直接发生的通过在堆栈跟踪指示的位置检查Hadoop

当启动

MapReduce

作业时，我得到一个

NullPointerException

。它是由

SerializationFactory

的

getSerializer（）

方法抛出的。我正在使用一个自定义的

InputSplit

，

InputFormat

，

RecordReader

和

MapReduce

值类

我知道这个错误是在我的

InputFormat

类创建拆分后的一段时间内抛出的，但是在创建

RecordReader

之前。据我所知，它是在“清理临时区域”消息之后直接发生的

通过在堆栈跟踪指示的位置检查Hadoop源代码，当

getSerialization（）

接收到null

Class

指针时，似乎发生了错误。JobClient的

writeNewSplits（）

调用该方法如下：

Serializer<T> serializer = factory.getSerializer((Class<T>) split.getClass());

谢谢

编辑：自定义InputSplit的我的代码如下：

import . . .

/**
 * A document directory within the input directory. 
 * Returned by DirectoryInputFormat.getSplits()
 * and passed to DirectoryInputFormat.createRecordReader().
 *
 * Represents the data to be processed by an individual Map process.
 */
public class DirectorySplit extends InputSplit {
    /**
     * Constructs a DirectorySplit object
     * @param docDirectoryInHDFS The location (in HDFS) of this
     *            document's directory, complete with all annotations.
     * @param fs The filesystem associated with this job
     */
    public  DirectorySplit( Path docDirectoryInHDFS, FileSystem fs )
            throws IOException {
        this.inputPath = docDirectoryInHDFS;
        hash = FileSystemHandler.getFileNameFromPath(inputPath);
        this.fs = fs;
    }

    /**
     * Get the size of the split so that the input splits can be sorted by size.
     * Here, we calculate the size to be the number of bytes in the original
     * document (i.e., ignoring all annotations).
     *
     * @return The number of characters in the original document
     */
    @Override
    public long getLength() throws IOException, InterruptedException {
        Path origTxt = new Path( inputPath, "original.txt" );
        HadoopInterface.logger.log( msg );
        return FileSystemHandler.getFileSizeInBytes( origTxt, fs);
    }

    /**
     * Get the list of nodes where the data for this split would be local.
     * This list includes all nodes that contain any of the required data---it's
     * up to Hadoop to decide which one to use.
     *
     * @return An array of the nodes for whom the split is local
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    public String[] getLocations() throws IOException, InterruptedException {
        FileStatus status = fs.getFileStatus(inputPath);

        BlockLocation[] blockLocs = fs.getFileBlockLocations( status, 0,
                                                              status.getLen() );

        HashSet<String> allBlockHosts = new HashSet<String>();
        for( BlockLocation blockLoc : blockLocs ) {
            allBlockHosts.addAll( Arrays.asList( blockLoc.getHosts() ) );
        }

        return (String[])allBlockHosts.toArray();
    }

    /**
     * @return The hash of the document that this split handles
     */
    public String toString() {
        return hash;
    }

    private Path inputPath;
    private String hash;
    private FileSystem fs;
}

导入。
/**
*输入目录中的文档目录。
*由DirectoryInputFormat.getSplits（）返回
*并传递给DirectoryInputFormat.createRecordReader（）。
*
*表示要由单个映射进程处理的数据。
*/
公共类DirectorySpilt扩展了InputSplit{
/**
*构造DirectorySpilt对象
*@param docdirectoryindfs此文件的位置（以HDFS为单位）
*文档的目录，包括所有注释。
*@param fs与此作业关联的文件系统
*/
公共目录拆分（路径docdirectoryindfs，文件系统fs）
抛出IOException{
this.inputPath=docdirectoryindfs；
hash=FileSystemHandler.getFileNameFromPath（inputPath）；
this.fs=fs；
}
/**
*获取拆分的大小，以便可以按大小对输入拆分进行排序。
*这里，我们将大小计算为原始文件中的字节数
*文档（即忽略所有注释）。
*
*@返回原始文档中的字符数
*/
@凌驾
public long getLength（）引发IOException、InterruptedException{
Path origTxt=新路径（inputPath，“original.txt”）；
HadoopInterface.logger.log（msg）；
返回FileSystemHandler.getFileSizeInBytes（OrigText，fs）；
}
/**
*获取此拆分的数据将位于本地的节点列表。
*此列表包括包含任何所需数据的所有节点——它是
*由Hadoop决定使用哪一个。
*
*@返回拆分为本地的节点数组
*@抛出异常
*@抛出中断异常
*/
@凌驾
公共字符串[]getLocations（）引发IOException、InterruptedException{
FileStatus status=fs.getFileStatus（inputPath）；
BlockLocation[]blockLocs=fs.getFileBlockLocations（状态，0，
status.getLen（））；
HashSet allBlockHosts=new HashSet（）；
用于（BlockLocation blockLoc:blockLocs）{
allBlockHosts.addAll（Arrays.asList（blockLoc.getHosts（））；
}
返回（字符串[]）allBlockHosts.toArray（）；
}
/**
*@返回此拆分处理的文档的哈希值
*/
公共字符串toString（）{
返回散列；
}
专用路径输入路径；
私有字符串散列；
专用文件系统fs；
}

InputSplit不扩展可写，您需要明确声明您的输入拆分实现可写

您可以发布自定义InputSplit的代码吗？它是否扩展可写？我的猜测不是代码加上去的。我假设因为我扩展了实现可写的InputSplit，所以我不需要直接实现可写。不是这样吗？我还要补充一点，您的InputSplit版本必须有一个默认构造函数，Hadoop使用它来实例化该类。

import . . .

/**
 * A document directory within the input directory. 
 * Returned by DirectoryInputFormat.getSplits()
 * and passed to DirectoryInputFormat.createRecordReader().
 *
 * Represents the data to be processed by an individual Map process.
 */
public class DirectorySplit extends InputSplit {
    /**
     * Constructs a DirectorySplit object
     * @param docDirectoryInHDFS The location (in HDFS) of this
     *            document's directory, complete with all annotations.
     * @param fs The filesystem associated with this job
     */
    public  DirectorySplit( Path docDirectoryInHDFS, FileSystem fs )
            throws IOException {
        this.inputPath = docDirectoryInHDFS;
        hash = FileSystemHandler.getFileNameFromPath(inputPath);
        this.fs = fs;
    }

    /**
     * Get the size of the split so that the input splits can be sorted by size.
     * Here, we calculate the size to be the number of bytes in the original
     * document (i.e., ignoring all annotations).
     *
     * @return The number of characters in the original document
     */
    @Override
    public long getLength() throws IOException, InterruptedException {
        Path origTxt = new Path( inputPath, "original.txt" );
        HadoopInterface.logger.log( msg );
        return FileSystemHandler.getFileSizeInBytes( origTxt, fs);
    }

    /**
     * Get the list of nodes where the data for this split would be local.
     * This list includes all nodes that contain any of the required data---it's
     * up to Hadoop to decide which one to use.
     *
     * @return An array of the nodes for whom the split is local
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    public String[] getLocations() throws IOException, InterruptedException {
        FileStatus status = fs.getFileStatus(inputPath);

        BlockLocation[] blockLocs = fs.getFileBlockLocations( status, 0,
                                                              status.getLen() );

        HashSet<String> allBlockHosts = new HashSet<String>();
        for( BlockLocation blockLoc : blockLocs ) {
            allBlockHosts.addAll( Arrays.asList( blockLoc.getHosts() ) );
        }

        return (String[])allBlockHosts.toArray();
    }

    /**
     * @return The hash of the document that this split handles
     */
    public String toString() {
        return hash;
    }

    private Path inputPath;
    private String hash;
    private FileSystem fs;
}