Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何更改spark输出[sparkJava]的文件名_Apache Spark - Fatal编程技术网

Apache spark 如何更改spark输出[sparkJava]的文件名

Apache spark 如何更改spark输出[sparkJava]的文件名,apache-spark,Apache Spark,我想更改spark输出文件名,但无论如何都要这样做,spark正在AWS S3中写入。检查下面的代码是否适用于HDFS&以及S3 创建了rename函数,该函数将采用要更改的路径和名称,如果目录包含多个文件,则只需在文件名末尾添加序列编号,如json\u data\u 1.json import org.apache.hadoop.fs.{FileSystem, Path, RemoteIterator} import org.apache.hadoop.fs._ // For convert

我想更改spark输出文件名,但无论如何都要这样做,spark正在AWS S3中写入。

检查下面的代码是否适用于
HDFS
&以及
S3

创建了
rename
函数,该函数将采用要更改的路径和名称,如果
目录
包含多个文件,则只需在文件名末尾添加
序列
编号,如
json\u data\u 1.json

import org.apache.hadoop.fs.{FileSystem, Path, RemoteIterator}
import org.apache.hadoop.fs._

// For converting to scala Iterator
implicit def convertToScalaIterator[T](remoteIterator: RemoteIterator[T]): Iterator[T] = {
    case class wrapper(remoteIterator: RemoteIterator[T]) extends Iterator[T] {
      override def hasNext: Boolean = remoteIterator.hasNext
      override def next(): T = remoteIterator.next()
    }
    wrapper(remoteIterator)
}


这描述太短了。提供更多详细信息,如果可能,提供代码片段。
import java.net.URI
def fs(path: String) = FileSystem.get(URI.create(path),spark.sparkContext.hadoopConfiguration)

// Rename files 
def rename(path: String,name: String) = {
    fs(path)
    .listFiles(new Path(path),true)
    .toList
    .filter(_.isFile)
    .map(_.getPath)
    .filterNot(_.toString.contains("_SUCCESS"))
    .zipWithIndex
    .map(p => fs(p._1.toString).rename(p._1,new Path(s"${p._1.getParent}/${name}_${p._2}.${p._1.toString.split("\\.")(1)}")))
}
scala> val path = "/tmp/samplea"
path: String = /tmp/samplea

scala> df.repartition(5).write.format("json").mode("overwrite").save(path)

scala> s"ls -ltr ${path}".!
total 8
-rw-r--r--  1 sriniva  wheel    0 Jun  6 13:57 part-00000-607ffd5e-7d28-4331-9a69-de36254c80b1-c000.json
-rw-r--r--  1 sriniva  wheel  282 Jun  6 13:57 part-00001-607ffd5e-7d28-4331-9a69-de36254c80b1-c000.json
-rw-r--r--  1 sriniva  wheel    0 Jun  6 13:57 _SUCCESS

scala> rename(path,"json_data")
res193: List[Boolean] = List(true, true)

scala> s"ls -ltr ${path}".!
total 8
-rw-r--r--  1 sriniva  wheel    0 Jun  6 13:57 json_data_0.json
-rw-r--r--  1 sriniva  wheel  282 Jun  6 13:57 json_data_1.json
-rw-r--r--  1 sriniva  wheel    0 Jun  6 13:57 _SUCCESS