Scala 为什么对可写数据进行隐式转换';行不通

Scala 为什么对可写数据进行隐式转换';行不通,scala,hadoop,apache-spark,rdd,Scala,Hadoop,Apache Spark,Rdd,SparkContext定义了writeable与其基本类型之间的一对隐式转换,如longwriteable Long,Text String 测试用例1: 我使用以下代码来组合小文件 @Test def testCombineSmallFiles(): Unit = { val path = "file:///d:/logs" val rdd = sc.newAPIHadoopFile[LongWritable,Text, CombineTextInputForm

SparkContext
定义了
writeable
与其基本类型之间的一对隐式转换,如
longwriteable Long
Text String

  • 测试用例1:
我使用以下代码来组合小文件

  @Test
  def  testCombineSmallFiles(): Unit = {
    val path = "file:///d:/logs"
    val rdd = sc.newAPIHadoopFile[LongWritable,Text, CombineTextInputFormat](path)
    println(s"rdd partition number is ${rdd.partitions.length}")
    println(s"lines is :${rdd.count()}")
  }
上述代码运行良好,但如果我使用以下行获取rdd,将导致编译错误:

val rdd = sc.newAPIHadoopFile[Long,String, CombineTextInputFormat](path)
看起来隐式转换没有生效。我想知道这里出了什么问题,为什么不起作用

  • 测试用例2:
对于使用sequenceFile的以下代码,隐式转换看起来很有效(文本转换为字符串,IntWritable转换为Int)

比较这两个测试用例,我没有看到一个有效,另一个无效的关键区别

  • 注:
我在测试用例2中使用的
SparkContext#sequenceFile
方法是:

  def sequenceFile[K, V](
      path: String,
      keyClass: Class[K],
      valueClass: Class[V]): RDD[(K, V)] = withScope {
    assertNotStopped()
    sequenceFile(path, keyClass, valueClass, defaultMinPartitions)
  }
sequenceFile
方法中,它调用另一个sequenceFile方法,该方法调用hadoopFile方法来读取数据

  def sequenceFile[K, V](path: String,
      keyClass: Class[K],
      valueClass: Class[V],
      minPartitions: Int
      ): RDD[(K, V)] = withScope {
    assertNotStopped()
    val inputFormatClass = classOf[SequenceFileInputFormat[K, V]]
    hadoopFile(path, inputFormatClass, keyClass, valueClass, minPartitions)
  }

要使用隐式转换,需要使用可写转换器。 例如:

   def sequenceFile[K, V]
       (path: String, minPartitions: Int = defaultMinPartitions)
       (implicit km: ClassTag[K], vm: ClassTag[V],
        kcf: () => WritableConverter[K], vcf: () => WritableConverter[V]): RDD[(K, V)] = {...}
sc.newAPIHadoopFile
中,我看不到任何地方使用过这个。所以这是不可能的

另外,请验证您是否使用了导入SparkContext.\u(我无法在您的帖子中看到导入内容)

请。查看
writeableconverters
,其中包含以下代码

/**
 * A class encapsulating how to convert some type `T` from `Writable`. It stores both the `Writable`
 * class corresponding to `T` (e.g. `IntWritable` for `Int`) and a function for doing the
 * conversion.
 * The getter for the writable class takes a `ClassTag[T]` in case this is a generic object
 * that doesn't know the type of `T` when it is created. This sounds strange but is necessary to
 * support converting subclasses of `Writable` to themselves (`writableWritableConverter()`).
 */
private[spark] class WritableConverter[T](
    val writableClass: ClassTag[T] => Class[_ <: Writable],
    val convert: Writable => T)
  extends Serializable

object WritableConverter {

  // Helper objects for converting common types to Writable
  private[spark] def simpleWritableConverter[T, W <: Writable: ClassTag](convert: W => T)
  : WritableConverter[T] = {
    val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
    new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
  }

  // The following implicit functions were in SparkContext before 1.3 and users had to
  // `import SparkContext._` to enable them. Now we move them here to make the compiler find
  // them automatically. However, we still keep the old functions in SparkContext for backward
  // compatibility and forward to the following functions directly.

  implicit def intWritableConverter(): WritableConverter[Int] =
    simpleWritableConverter[Int, IntWritable](_.get)

  implicit def longWritableConverter(): WritableConverter[Long] =
    simpleWritableConverter[Long, LongWritable](_.get)

  implicit def doubleWritableConverter(): WritableConverter[Double] =
    simpleWritableConverter[Double, DoubleWritable](_.get)

  implicit def floatWritableConverter(): WritableConverter[Float] =
    simpleWritableConverter[Float, FloatWritable](_.get)

  implicit def booleanWritableConverter(): WritableConverter[Boolean] =
    simpleWritableConverter[Boolean, BooleanWritable](_.get)

  implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = {
    simpleWritableConverter[Array[Byte], BytesWritable] { bw =>
      // getBytes method returns array which is longer then data to be returned
      Arrays.copyOfRange(bw.getBytes, 0, bw.getLength)
    }
  }

  implicit def stringWritableConverter(): WritableConverter[String] =
    simpleWritableConverter[String, Text](_.toString)

  implicit def writableWritableConverter[T <: Writable](): WritableConverter[T] =
    new WritableConverter[T](_.runtimeClass.asInstanceOf[Class[T]], _.asInstanceOf[T])
}
/**
*一个类,封装了如何将某些类型“T”从“Writable”转换为“T”。它存储了两个“可写的”`
*类(例如,`Int`的`intwriteable`)和一个用于执行以下操作的函数
*转换。
*如果这是一个泛型对象,则可写类的getter将采用`ClassTag[T]`作为getter
*它不知道创建时“t”的类型。这听起来很奇怪,但却是必要的
*支持将'Writable'的子类转换为它们自己('writableWritableConverter()`)。
*/
专用[spark]类可写转换器[T](
val writableClass:ClassTag[T]=>Class[\ut)
扩展可序列化
对象可写转换器{
//用于将通用类型转换为可写类型的帮助器对象
专用[spark]def SimpleWritable转换器[T,W T)
:可写转换器[T]={
val wClass=classTag[W].runtimeClass.asInstanceOf[Class[W]]
新的可写转换器[T](=>wClass,x=>convert(x.asInstanceOf[W]))
}
//以下隐式函数在SparkContext 1.3之前的版本中,用户必须
//`import SparkContext.\`来启用它们。现在我们将它们移到这里,让编译器找到它们
//但是,我们仍然保留SparkContext中的旧函数以供向后使用
//兼容并直接转发到以下函数。
隐式def intWritableConverter():WritableConverter[Int]=
simpleWritableConverter[Int,IntWritable](u2;.get)
隐式def longWritableConverter():WritableConverter[Long]=
simpleWritableConverter[Long,LongWritable](uu.get)
隐式def doubleWritableConverter():WritableConverter[Double]=
simpleWritableConverter[Double,DoubleWritable](uu.get)
隐式def floatWritableConverter():WritableConverter[Float]=
simpleWritableConverter[Float,FloatWritable](.get)
隐式def booleanWritableConverter():WritableConverter[布尔]=
simpleWritableConverter[Boolean,BooleanWritable](.get)
隐式def bytesWritableConverter():WritableConverter[数组[字节]]={
simpleWritableConverter[Array[Byte],BytesWritable]{bw=>
//getBytes方法返回的数组比要返回的数据长
copyOfRange(bw.getBytes,0,bw.getLength)
}
}
隐式def stringWritableConverter():WritableConverter[字符串]=
simpleWritableConverter[String,Text](uu.toString)

隐式def可写转换器[T感谢@ram ghadiyaram的回复。我已经更新了我的问题,并给出了两个测试用例,一个有效,另一个无效,但我无法找出它们之间的区别。感谢@ram ghadiyaram。我使用的SparkContext#sequenceFile有3个重载方法,没有任何隐含的东西。请查看我更新的问题(在末尾)此方法正在调用另一个具有隐式转换doneThanks@ram ghadiyaram的方法。请参阅我更新的问题的结尾。如果我正确获得该方法,则该方法不会调用具有隐式转换的方法。在测试用例2中,您使用的是
val rdd=sc.sequenceFile(outputDir+“/part-00000”,classOf[String],classOf[Int])
。如果是这种情况,那么您已经为ex-classOf[String]传递了直接类(这是方法所期望的,这就是为什么没有错误)而不是原语。
/**
 * A class encapsulating how to convert some type `T` from `Writable`. It stores both the `Writable`
 * class corresponding to `T` (e.g. `IntWritable` for `Int`) and a function for doing the
 * conversion.
 * The getter for the writable class takes a `ClassTag[T]` in case this is a generic object
 * that doesn't know the type of `T` when it is created. This sounds strange but is necessary to
 * support converting subclasses of `Writable` to themselves (`writableWritableConverter()`).
 */
private[spark] class WritableConverter[T](
    val writableClass: ClassTag[T] => Class[_ <: Writable],
    val convert: Writable => T)
  extends Serializable

object WritableConverter {

  // Helper objects for converting common types to Writable
  private[spark] def simpleWritableConverter[T, W <: Writable: ClassTag](convert: W => T)
  : WritableConverter[T] = {
    val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
    new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
  }

  // The following implicit functions were in SparkContext before 1.3 and users had to
  // `import SparkContext._` to enable them. Now we move them here to make the compiler find
  // them automatically. However, we still keep the old functions in SparkContext for backward
  // compatibility and forward to the following functions directly.

  implicit def intWritableConverter(): WritableConverter[Int] =
    simpleWritableConverter[Int, IntWritable](_.get)

  implicit def longWritableConverter(): WritableConverter[Long] =
    simpleWritableConverter[Long, LongWritable](_.get)

  implicit def doubleWritableConverter(): WritableConverter[Double] =
    simpleWritableConverter[Double, DoubleWritable](_.get)

  implicit def floatWritableConverter(): WritableConverter[Float] =
    simpleWritableConverter[Float, FloatWritable](_.get)

  implicit def booleanWritableConverter(): WritableConverter[Boolean] =
    simpleWritableConverter[Boolean, BooleanWritable](_.get)

  implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = {
    simpleWritableConverter[Array[Byte], BytesWritable] { bw =>
      // getBytes method returns array which is longer then data to be returned
      Arrays.copyOfRange(bw.getBytes, 0, bw.getLength)
    }
  }

  implicit def stringWritableConverter(): WritableConverter[String] =
    simpleWritableConverter[String, Text](_.toString)

  implicit def writableWritableConverter[T <: Writable](): WritableConverter[T] =
    new WritableConverter[T](_.runtimeClass.asInstanceOf[Class[T]], _.asInstanceOf[T])
}