Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/maven/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Spark scala输入/输出目录_Scala_Maven_Apache Spark - Fatal编程技术网

Spark scala输入/输出目录

Spark scala输入/输出目录,scala,maven,apache-spark,Scala,Maven,Apache Spark,我是Spark/Scala编程新手。我能够使用maven进行设置,并能够运行示例字数计算程序 对于在spark环境中运行和在Windows本地运行,我这里有两个问题: 1.scala程序如何识别输入。 2.如何将输出写入文本文件 这是我的密码 import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD.rddToPairRDDFunctions obj

我是Spark/Scala编程新手。我能够使用maven进行设置,并能够运行示例字数计算程序

对于在spark环境中运行和在Windows本地运行,我这里有两个问题: 1.scala程序如何识别输入。 2.如何将输出写入文本文件

这是我的密码

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
object WordCount {
 def main(args: Array[String]) = {

//Start the Spark context
val conf = new SparkConf()
  .setAppName("WordCount")
  .setMaster("local")
val sc = new SparkContext(conf)

//Read some example file to a test RDD
val textFile = sc.textFile("file:/home/root1/Avinash/data.txt")

val counts = textFile.flatMap(line => line.split(" "))
             .map(word => (word, 1))
             .reduceByKey(_ + _)
             counts.foreach(println)
             counts.collect()
    counts.saveAsTextFile("file:/home/root1/Avinash/output")

}
}

当我将文件放在file:/home/root1/Avinash/data.txt中并尝试运行它时,它不起作用。只有当我将data.txt放在/home/root1/softs/spark-1.6.1/bin中或工作区的项目文件夹中时,它才尝试接受输入

类似地,当我尝试使用counts.saveAsTextFile(“file:/home/root1/Avinash/output”)写入输出时,它不是在写入,而是在抛出错误 线程“main”java.io.IOException中的异常:scheme:D没有文件系统


请帮我解决这个问题

您应该在文件中使用//。这是一个例子

val textFile = sc.textFile("file:///home/root1/Avinash/data.txt")

val counts = textFile.flatMap(line => line.split(" "))
             .map(word => (word, 1))
             .reduceByKey(_ + _).cache() 

             counts.foreach(println)
             //counts.collect()
    counts.saveAsTextFile("file:///home/root1/Avinash/output")

如果文件很大,则每次对RDD执行操作时都要使用缓存以避免计算。这是一个例子

val textFile = sc.textFile("file:///home/root1/Avinash/data.txt")

val counts = textFile.flatMap(line => line.split(" "))
             .map(word => (word, 1))
             .reduceByKey(_ + _).cache() 

             counts.foreach(println)
             //counts.collect()
    counts.saveAsTextFile("file:///home/root1/Avinash/output")
如果文件很大,每次在RDD上执行操作时都使用缓存避免计算