Java Hadoop 2.5.0远程写入文件失败
在远程使用Hadoop Java API将文件放入HDFS时,我遇到了一个问题。在Hadoop系统上运行时,我能够将本地文件复制到hdfs而不会出现问题。然而,当我试图将数据放入文件时,远程出现了一个问题。我得到以下例外情况:Java Hadoop 2.5.0远程写入文件失败,java,hadoop,hdfs,Java,Hadoop,Hdfs,在远程使用Hadoop Java API将文件放入HDFS时,我遇到了一个问题。在Hadoop系统上运行时,我能够将本地文件复制到hdfs而不会出现问题。然而,当我试图将数据放入文件时,远程出现了一个问题。我得到以下例外情况: Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/books/beowulf.txt could only be r
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/books/beowulf.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:368)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1449)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)
我在datanode日志中没有看到任何错误,但在namenode日志中看到了相应的错误消息:
2014-11-04 14:19:26,111 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 3 Total time for transactions(ms): 13 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 10
2014-11-04 14:19:26,801 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2014-11-04 14:19:26,802 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2014-11-04 14:19:27,136 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/root/books/beowulf.txt. BP-342727372-10.0.0.17-1414068411758 blk_1073741852_1028{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-511723cb-ff72-4585-bb81-90a2e1f154a3:NORMAL|RBW]]}
2014-11-04 14:19:50,859 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1. For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2014-11-04 14:19:50,860 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.56.1:3805 Call#4 Retry#0
java.io.IOException: File /user/root/books/beowulf.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
据我所知,只有在我关闭FSDataOutputStream之后才会出现异常
下面是我正在使用的代码,它产生了这个问题:
import com.spectralogic.ds3.hadoop.HadoopConstants;
import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;
import java.io.IOException;
import java.io.InputStream;
import java.security.PrivilegedExceptionAction;
public class HdfsPutFile {
public static void main(final String[] args) throws IOException, InterruptedException {
final Configuration conf = new Configuration();
final UserGroupInformation usgi = UserGroupInformation.createRemoteUser("root");
usgi.doAs(new PrivilegedExceptionAction<Object>() {
@Override
public Object run() throws Exception {
conf.set(HadoopConstants.FS_DEFAULT_NAME, "hdfs://192.168.56.102:9000");
conf.set(HadoopConstants.HADOOP_JOB_UGI, "root");
try (final FileSystem hdfs = FileSystem.get(conf)) {
System.out.printf("Total Used Hdfs Storage: %d\n", hdfs.getStatus().getUsed());
final String resourceName = "books/beowulf.txt";
final Path path = new Path("/user/root", resourceName);
try (final InputStream inputStream = HdfsPutFile.class.getClassLoader().getResourceAsStream(resourceName);
final FSDataOutputStream outputStream = hdfs.create(path, true)) {
IOUtils.copy(inputStream, outputStream);
}
}
return null;
}
});
}
}
import com.spectralogic.ds3.hadoop.hadoop常量;
导入org.apache.commons.io.IOUtils;
导入org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FSDataOutputStream;
导入org.apache.hadoop.fs.FileSystem;
导入org.apache.hadoop.fs.Path;
导入org.apache.hadoop.security.UserGroupInformation;
导入java.io.IOException;
导入java.io.InputStream;
导入java.security.PrivilegedExceptionAction;
公共类HdfsPutFile{
公共静态void main(最终字符串[]args)引发IOException、InterruptedException{
最终配置conf=新配置();
final UserGroupInformation usgi=UserGroupInformation.createRemoteUser(“根”);
usgi.doAs(新的PrivilegedExceptionAction(){
@凌驾
公共对象run()引发异常{
conf.set(HadoopConstants.FS\u默认\u名称,“hdfs://192.168.56.102:9000");
conf.set(HadoopConstants.HADOOP_JOB_UGI,“根”);
try(final FileSystem hdfs=FileSystem.get(conf)){
System.out.printf(“使用的Hdfs存储总量:%d\n”,Hdfs.getStatus().getUsed());
最后一个字符串resourceName=“books/beowulf.txt”;
最终路径路径=新路径(“/user/root”,resourceName);
try(final InputStream InputStream=HdfsPutFile.class.getClassLoader().getResourceAsStream(resourceName);
final FSDataOutputStream outputStream=hdfs.create(路径,true)){
复制(inputStream,outputStream);
}
}
返回null;
}
});
}
}
结果表明,失败的原因是因为我的代码无法到达datanode,因为它位于Docker容器内部,而它的IP地址是内部Docker容器的IP地址。如果我在容器内并运行代码,那么我就能够成功地放置文件。
因此,当Hadoop在docker中并且您希望远程使用它时,您需要使用-p将一些Hadoop的端口发布到主机
为了告诉Hadoop您希望使用主机名而不是IP地址,您必须在客户端的hdfs-site.xml中添加以下块
设置dfs.client.use.datanode.hostname为true。这里的类似问题:我已经按照您所说的进行了配置:所有必要的端口都已公开,并且
dfs.client.use.datanode.hostname
和dfs.datanode.use.datanode.hostname
都是true
,但我仍然遇到这个问题,我相信它是正确的连接问题。您是否设法将整个docker网络暴露到您的计算机上?或者一些解决方法?我最终在容器中运行了我的代码。另一个选项是在自己的容器中运行代码,并使用docker compose之类的工具使您的容器和hadoop容器使用相同的容器网络运行。我尝试了第二个选项,我在具有相同网络的另一个容器上运行我的应用程序,但出现了相同的错误(如上所述)。无论如何,这仍然不是一个令人满意的选择。但是谢谢你!