Scala 将文件从本地移动到HDFS_Scala_Hadoop_Apache Spark_Hive_Apache Pig

Scala 将文件从本地移动到HDFS

scala hadoop apache-spark hive apache-pig

Scala 将文件从本地移动到HDFS,scala,hadoop,apache-spark,hive,apache-pig,Scala,Hadoop,Apache Spark,Hive,Apache Pig,我的环境使用火花、猪和蜂巢我在用Scala（或与我的环境兼容的任何其他语言）编写代码时遇到了一些麻烦，这些代码可能会将文件从本地文件系统复制到HDFS 有人对我应该如何进行有什么建议吗？您可以使用Hadoop API编写Scala作业。并使用from apache commons将数据从InputStream复制到OutputStream import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.File

我的环境使用火花、猪和蜂巢

我在用Scala（或与我的环境兼容的任何其他语言）编写代码时遇到了一些麻烦，这些代码可能会将文件从本地文件系统复制到HDFS

有人对我应该如何进行有什么建议吗？

您可以使用Hadoop API编写Scala作业。
并使用from apache commons将数据从InputStream复制到OutputStream

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import org.apache.commons.io.IOUtils;



val hadoopconf = new Configuration();
val fs = FileSystem.get(hadoopconf);

//Create output stream to HDFS file
val outFileStream = fs.create(new Path("hedf://<namenode>:<port>/<filename>))

//Create input stream from local file
val inStream = fs.open(new Path("file://<input_file>"))

IOUtils.copy(inStream, outFileStream)

//Close both files
inStream.close()
outFileStream.close()

import org.apache.hadoop.conf.Configuration；
导入org.apache.hadoop.fs.FileSystem；
导入org.apache.hadoop.fs.Path；
导入org.apache.commons.io.IOUtils；
val hadoopconf=新配置（）；
val fs=FileSystem.get（hadoopconf）；
//创建HDFS文件的输出流
val outFileStream=fs.create（新路径（“hedf://:/））
//从本地文件创建输入流
val inStream=fs.open（新路径（“文件：/”）
IOUtils.副本（流入、流出）
//关闭两个文件
流内关闭（）
outFileStream.close（）

您可以使用Hadoop API编写Scala作业。
并使用from apache commons将数据从InputStream复制到OutputStream

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import org.apache.commons.io.IOUtils;



val hadoopconf = new Configuration();
val fs = FileSystem.get(hadoopconf);

//Create output stream to HDFS file
val outFileStream = fs.create(new Path("hedf://<namenode>:<port>/<filename>))

//Create input stream from local file
val inStream = fs.open(new Path("file://<input_file>"))

IOUtils.copy(inStream, outFileStream)

//Close both files
inStream.close()
outFileStream.close()

import org.apache.hadoop.conf.Configuration；
导入org.apache.hadoop.fs.FileSystem；
导入org.apache.hadoop.fs.Path；
导入org.apache.commons.io.IOUtils；
val hadoopconf=新配置（）；
val fs=FileSystem.get（hadoopconf）；
//创建HDFS文件的输出流
val outFileStream=fs.create（新路径（“hedf://:/））
//从本地文件创建输入流
val inStream=fs.open（新路径（“文件：/”）
IOUtils.副本（流入、流出）
//关闭两个文件
流内关闭（）
outFileStream.close（）

其他答案对我不起作用，所以我在这里写另一个

请尝试以下Scala代码：

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val hadoopConf = new Configuration()
val hdfs = FileSystem.get(hadoopConf)

val srcPath = new Path(srcFilePath)
val destPath = new Path(destFilePath)

hdfs.copyFromLocalFile(srcPath, destPath)

您还应该检查Spark是否在

CONF/Spark env.sh

文件中设置了

HADOOP\u CONF\u DIR

变量。这将确保Spark能够找到Hadoop配置设置

build.sbt

文件的依赖项：

libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"
libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"

其他答案不适合我，所以我在这里写另一个

请尝试以下Scala代码：

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

val hadoopConf = new Configuration()
val hdfs = FileSystem.get(hadoopConf)

val srcPath = new Path(srcFilePath)
val destPath = new Path(destFilePath)

hdfs.copyFromLocalFile(srcPath, destPath)

您还应该检查Spark是否在

CONF/Spark env.sh

文件中设置了

HADOOP\u CONF\u DIR

变量。这将确保Spark能够找到Hadoop配置设置

build.sbt

文件的依赖项：

libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0"
libraryDependencies += "org.apache.commons" % "commons-io" % "1.3.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.6.0"

这里有一些适用于S3的东西（从上面修改）

谢谢！依赖项：

libraryDependencies+=“org.apache.hadoop”%“hadoop common”%“2.6.0”

libraryDependencies+=“org.apache.commons”%“commons io”%“1.3.2”

库依赖项+=“org.apache.hadoop”%“hadoop hdfs”%“2.6.0”非常感谢！依赖项：

libraryDependencies+=“org.apache.hadoop”%“hadoop common”%“2.6.0”

libraryDependencies+=“org.apache.commons”%“commons io”%“1.3.2”

库依赖项+=“org.apache.hadoop”%“hadoop hdfs”%“2.6.0”