Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 无效工作例外。未设置输出目录_Scala_Apache Spark_Dataframe_Rdd_Google Cloud Bigtable - Fatal编程技术网

Scala 无效工作例外。未设置输出目录

Scala 无效工作例外。未设置输出目录,scala,apache-spark,dataframe,rdd,google-cloud-bigtable,Scala,Apache Spark,Dataframe,Rdd,Google Cloud Bigtable,我正试图使用SparkSession val spark = SparkSession .builder .config(conf) .appName("my-job") .getOrCreate() val hadoopConf = spark.sparkContext.hadoopConfiguration import spark.implicits._ case class BestSellerRecord(skuNbr: String, slsQty: String

我正试图使用
SparkSession

val spark = SparkSession
  .builder
  .config(conf)
  .appName("my-job")
  .getOrCreate()

val hadoopConf = spark.sparkContext.hadoopConfiguration

import spark.implicits._
case class BestSellerRecord(skuNbr: String, slsQty: String, slsDollar: String, dmaNbr: String, productId: String)

val seq: DataFrame = Seq(("foo", "1", "foo1"), ("bar", "2", "bar1")).toDF("key", "value1", "value2")

val bigtablePuts = seq.toDF.rdd.map((row: Row) => {
  val put = new Put(Bytes.toBytes(row.getString(0)))
  put.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes("nbr"), Bytes.toBytes(row.getString(0)))
  (new ImmutableBytesWritable(), put)
})

bigtablePuts.saveAsNewAPIHadoopDataset(hadoopConf)
但这给了我以下的例外

Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set.
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:138)
at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.assertConf(SparkHadoopWriter.scala:391)
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081)
那是从哪里来的

bigtablePuts.saveAsNewAPIHadoopDataset(hadoopConf)
这条线。此外,我还尝试使用
hadoopConf.set
设置不同的配置,例如
conf.set(“spark.hadoop.validateOutputSpecs”,“false”)
,但这给了我一个
NullPointerException


如何解决此问题?

您是否可以尝试升级到mapreduce api,因为mapred已被弃用

此问题显示了重写此代码段的示例:

希望这是有帮助的