Scala 将文件从本地移动到HDFS
我的环境使用火花、猪和蜂巢 我在用Scala(或与我的环境兼容的任何其他语言)编写代码时遇到了一些麻烦,这些代码可能会将文件从本地文件系统复制到HDFSScala 将文件从本地移动到HDFS,scala,hadoop,apache-spark,hive,apache-pig,Scala,Hadoop,Apache Spark,Hive,Apache Pig,我的环境使用火花、猪和蜂巢 我在用Scala(或与我的环境兼容的任何其他语言)编写代码时遇到了一些麻烦,这些代码可能会将文件从本地文件系统复制到HDFS 有人对我应该如何进行有什么建议吗?您可以使用Hadoop API编写Scala作业。 并使用from apache commons将数据从InputStream复制到OutputStream import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.File
有人对我应该如何进行有什么建议吗?您可以使用Hadoop API编写Scala作业。
并使用from apache commons将数据从InputStream复制到OutputStream
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.commons.io.IOUtils;
val hadoopconf = new Configuration();
val fs = FileSystem.get(hadoopconf);
//Create output stream to HDFS file
val outFileStream = fs.create(new Path("hedf://<namenode>:<port>/<filename>))
//Create input stream from local file
val inStream = fs.open(new Path("file://<input_file>"))
IOUtils.copy(inStream, outFileStream)
//Close both files
inStream.close()
outFileStream.close()
import org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FileSystem;
导入org.apache.hadoop.fs.Path;
导入org.apache.commons.io.IOUtils;
val hadoopconf=新配置();
val fs=FileSystem.get(hadoopconf);
//创建HDFS文件的输出流
val outFileStream=fs.create(新路径(“hedf://:/))
//从本地文件创建输入流
val inStream=fs.open(新路径(“文件:/”)
IOUtils.副本(流入、流出)
//关闭两个文件
流内关闭()
outFileStream.close()
您可以使用Hadoop API编写Scala作业。并使用from apache commons将数据从InputStream复制到OutputStream
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.commons.io.IOUtils;
val hadoopconf = new Configuration();
val fs = FileSystem.get(hadoopconf);
//Create output stream to HDFS file
val outFileStream = fs.create(new Path("hedf://<namenode>:<port>/<filename>))
//Create input stream from local file
val inStream = fs.open(new Path("file://<input_file>"))
IOUtils.copy(inStream, outFileStream)
//Close both files
inStream.close()
outFileStream.close()
import org.apache.hadoop.conf.Configuration;
导入org.apache.hadoop.fs.FileSystem;
导入org.apache.hadoop.fs.Path;
导入org.apache.commons.io.IOUtils;
val hadoopconf=新配置();
val fs=FileSystem.get(hadoopconf);
//创建HDFS文件的输出流
val outFileStream=fs.create(新路径(“hedf://:/))
//从本地文件创建输入流
val inStream=fs.open(新路径(“文件:/”)
IOUtils.副本(流入、流出)
//关闭两个文件
流内关闭()
outFileStream.close()
其他答案对我不起作用,所以我在这里写另一个
请尝试以下Scala代码:
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
val hadoopConf = new Configuration()
val hdfs = FileSystem.get(hadoopConf)
val srcPath = new Path(srcFilePath)
val destPath = new Path(destFilePath)
hdfs.copyFromLocalFile(srcPath, destPath)
您还应该检查Spark是否在CONF/Spark env.sh
文件中设置了HADOOP\u CONF\u DIR
变量。这将确保Spark能够找到Hadoop配置设置
build.sbt
文件的依赖项:
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"
libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"
其他答案不适合我,所以我在这里写另一个 请尝试以下Scala代码:
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
val hadoopConf = new Configuration()
val hdfs = FileSystem.get(hadoopConf)
val srcPath = new Path(srcFilePath)
val destPath = new Path(destFilePath)
hdfs.copyFromLocalFile(srcPath, destPath)
您还应该检查Spark是否在CONF/Spark env.sh
文件中设置了HADOOP\u CONF\u DIR
变量。这将确保Spark能够找到Hadoop配置设置
build.sbt
文件的依赖项:
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"
libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"
这里有一些适用于S3的东西(从上面修改)
这里有一些适用于S3的东西(从上面修改)
谢谢!依赖项:
libraryDependencies+=“org.apache.hadoop”%“hadoop common”%“2.6.0”
libraryDependencies+=“org.apache.commons”%“commons io”%“1.3.2”
库依赖项+=“org.apache.hadoop”%“hadoop hdfs”%“2.6.0”非常感谢!依赖项:libraryDependencies+=“org.apache.hadoop”%“hadoop common”%“2.6.0”
libraryDependencies+=“org.apache.commons”%“commons io”%“1.3.2”
库依赖项+=“org.apache.hadoop”%“hadoop hdfs”%“2.6.0”