Scala 为什么对可写数据进行隐式转换'；行不通_Scala_Hadoop_Apache Spark_Rdd

Scala 为什么对可写数据进行隐式转换'；行不通

scala hadoop apache-spark

Scala 为什么对可写数据进行隐式转换'；行不通,scala,hadoop,apache-spark,rdd,Scala,Hadoop,Apache Spark,Rdd,SparkContext定义了writeable与其基本类型之间的一对隐式转换，如longwriteable Long，Text String 测试用例1：我使用以下代码来组合小文件 @Test def testCombineSmallFiles(): Unit = { val path = "file:///d:/logs" val rdd = sc.newAPIHadoopFile[LongWritable,Text, CombineTextInputForm

SparkContext

定义了

writeable

与其基本类型之间的一对隐式转换，如

longwriteable Long

，

Text String

测试用例1：

我使用以下代码来组合小文件

  @Test
  def  testCombineSmallFiles(): Unit = {
    val path = "file:///d:/logs"
    val rdd = sc.newAPIHadoopFile[LongWritable,Text, CombineTextInputFormat](path)
    println(s"rdd partition number is ${rdd.partitions.length}")
    println(s"lines is :${rdd.count()}")
  }

上述代码运行良好，但如果我使用以下行获取rdd，将导致编译错误：

val rdd = sc.newAPIHadoopFile[Long,String, CombineTextInputFormat](path)

看起来隐式转换没有生效。我想知道这里出了什么问题，为什么不起作用

测试用例2：

对于使用sequenceFile的以下代码，隐式转换看起来很有效（文本转换为字符串，IntWritable转换为Int）

比较这两个测试用例，我没有看到一个有效，另一个无效的关键区别

注:

我在测试用例2中使用的

SparkContext#sequenceFile

方法是：

  def sequenceFile[K, V](
      path: String,
      keyClass: Class[K],
      valueClass: Class[V]): RDD[(K, V)] = withScope {
    assertNotStopped()
    sequenceFile(path, keyClass, valueClass, defaultMinPartitions)
  }

在

sequenceFile

方法中，它调用另一个sequenceFile方法，该方法调用hadoopFile方法来读取数据

  def sequenceFile[K, V](path: String,
      keyClass: Class[K],
      valueClass: Class[V],
      minPartitions: Int
      ): RDD[(K, V)] = withScope {
    assertNotStopped()
    val inputFormatClass = classOf[SequenceFileInputFormat[K, V]]
    hadoopFile(path, inputFormatClass, keyClass, valueClass, minPartitions)
  }

要使用隐式转换，需要使用可写转换器。例如：

   def sequenceFile[K, V]
       (path: String, minPartitions: Int = defaultMinPartitions)
       (implicit km: ClassTag[K], vm: ClassTag[V],
        kcf: () => WritableConverter[K], vcf: () => WritableConverter[V]): RDD[(K, V)] = {...}

在

sc.newAPIHadoopFile

中，我看不到任何地方使用过这个。所以这是不可能的

另外，请验证您是否使用了导入SparkContext.\u（我无法在您的帖子中看到导入内容）

请。查看

writeableconverters

，其中包含以下代码

/**
 * A class encapsulating how to convert some type `T` from `Writable`. It stores both the `Writable`
 * class corresponding to `T` (e.g. `IntWritable` for `Int`) and a function for doing the
 * conversion.
 * The getter for the writable class takes a `ClassTag[T]` in case this is a generic object
 * that doesn't know the type of `T` when it is created. This sounds strange but is necessary to
 * support converting subclasses of `Writable` to themselves (`writableWritableConverter()`).
 */
private[spark] class WritableConverter[T](
    val writableClass: ClassTag[T] => Class[_ <: Writable],
    val convert: Writable => T)
  extends Serializable

object WritableConverter {

  // Helper objects for converting common types to Writable
  private[spark] def simpleWritableConverter[T, W <: Writable: ClassTag](convert: W => T)
  : WritableConverter[T] = {
    val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
    new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
  }

  // The following implicit functions were in SparkContext before 1.3 and users had to
  // `import SparkContext._` to enable them. Now we move them here to make the compiler find
  // them automatically. However, we still keep the old functions in SparkContext for backward
  // compatibility and forward to the following functions directly.

  implicit def intWritableConverter(): WritableConverter[Int] =
    simpleWritableConverter[Int, IntWritable](_.get)

  implicit def longWritableConverter(): WritableConverter[Long] =
    simpleWritableConverter[Long, LongWritable](_.get)

  implicit def doubleWritableConverter(): WritableConverter[Double] =
    simpleWritableConverter[Double, DoubleWritable](_.get)

  implicit def floatWritableConverter(): WritableConverter[Float] =
    simpleWritableConverter[Float, FloatWritable](_.get)

  implicit def booleanWritableConverter(): WritableConverter[Boolean] =
    simpleWritableConverter[Boolean, BooleanWritable](_.get)

  implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = {
    simpleWritableConverter[Array[Byte], BytesWritable] { bw =>
      // getBytes method returns array which is longer then data to be returned
      Arrays.copyOfRange(bw.getBytes, 0, bw.getLength)
    }
  }

  implicit def stringWritableConverter(): WritableConverter[String] =
    simpleWritableConverter[String, Text](_.toString)

  implicit def writableWritableConverter[T <: Writable](): WritableConverter[T] =
    new WritableConverter[T](_.runtimeClass.asInstanceOf[Class[T]], _.asInstanceOf[T])
}

/**
*一个类，封装了如何将某些类型“T”从“Writable”转换为“T”。它存储了两个“可写的”`
*类（例如，`Int`的`intwriteable`）和一个用于执行以下操作的函数
*转换。
*如果这是一个泛型对象，则可写类的getter将采用`ClassTag[T]`作为getter
*它不知道创建时“t”的类型。这听起来很奇怪，但却是必要的
*支持将'Writable'的子类转换为它们自己（'writableWritableConverter（）`）。
*/
专用[spark]类可写转换器[T](
val writableClass:ClassTag[T]=>Class[\ut）
扩展可序列化
对象可写转换器{
//用于将通用类型转换为可写类型的帮助器对象
专用[spark]def SimpleWritable转换器[T，W T）
：可写转换器[T]={
val wClass=classTag[W].runtimeClass.asInstanceOf[Class[W]]
新的可写转换器[T]（=>wClass，x=>convert（x.asInstanceOf[W]））
}
//以下隐式函数在SparkContext 1.3之前的版本中，用户必须
//`import SparkContext.\`来启用它们。现在我们将它们移到这里，让编译器找到它们
//但是，我们仍然保留SparkContext中的旧函数以供向后使用
//兼容并直接转发到以下函数。
隐式def intWritableConverter（）：WritableConverter[Int]=
simpleWritableConverter[Int，IntWritable]（u2;.get）
隐式def longWritableConverter（）：WritableConverter[Long]=
simpleWritableConverter[Long，LongWritable]（uu.get）
隐式def doubleWritableConverter（）：WritableConverter[Double]=
simpleWritableConverter[Double，DoubleWritable]（uu.get）
隐式def floatWritableConverter（）：WritableConverter[Float]=
simpleWritableConverter[Float，FloatWritable]（.get）
隐式def booleanWritableConverter（）：WritableConverter[布尔]=
simpleWritableConverter[Boolean，BooleanWritable]（.get）
隐式def bytesWritableConverter（）：WritableConverter[数组[字节]]={
simpleWritableConverter[Array[Byte]，BytesWritable]{bw=>
//getBytes方法返回的数组比要返回的数据长
copyOfRange（bw.getBytes，0，bw.getLength）
}
}
隐式def stringWritableConverter（）：WritableConverter[字符串]=
simpleWritableConverter[String，Text]（uu.toString）
隐式def可写转换器[T感谢@ram ghadiyaram的回复。我已经更新了我的问题，并给出了两个测试用例，一个有效，另一个无效，但我无法找出它们之间的区别。感谢@ram ghadiyaram。我使用的SparkContext#sequenceFile有3个重载方法，没有任何隐含的东西。请查看我更新的问题（在末尾）此方法正在调用另一个具有隐式转换doneThanks@ram ghadiyaram的方法。请参阅我更新的问题的结尾。如果我正确获得该方法，则该方法不会调用具有隐式转换的方法。在测试用例2中，您使用的是val rdd=sc.sequenceFile（outputDir+“/part-00000”，classOf[String]，classOf[Int]）。如果是这种情况，那么您已经为ex-classOf[String]传递了直接类（这是方法所期望的，这就是为什么没有错误）而不是原语。
/**
 * A class encapsulating how to convert some type `T` from `Writable`. It stores both the `Writable`
 * class corresponding to `T` (e.g. `IntWritable` for `Int`) and a function for doing the
 * conversion.
 * The getter for the writable class takes a `ClassTag[T]` in case this is a generic object
 * that doesn't know the type of `T` when it is created. This sounds strange but is necessary to
 * support converting subclasses of `Writable` to themselves (`writableWritableConverter()`).
 */
private[spark] class WritableConverter[T](
    val writableClass: ClassTag[T] => Class[_ <: Writable],
    val convert: Writable => T)
  extends Serializable

object WritableConverter {

  // Helper objects for converting common types to Writable
  private[spark] def simpleWritableConverter[T, W <: Writable: ClassTag](convert: W => T)
  : WritableConverter[T] = {
    val wClass = classTag[W].runtimeClass.asInstanceOf[Class[W]]
    new WritableConverter[T](_ => wClass, x => convert(x.asInstanceOf[W]))
  }

  // The following implicit functions were in SparkContext before 1.3 and users had to
  // `import SparkContext._` to enable them. Now we move them here to make the compiler find
  // them automatically. However, we still keep the old functions in SparkContext for backward
  // compatibility and forward to the following functions directly.

  implicit def intWritableConverter(): WritableConverter[Int] =
    simpleWritableConverter[Int, IntWritable](_.get)

  implicit def longWritableConverter(): WritableConverter[Long] =
    simpleWritableConverter[Long, LongWritable](_.get)

  implicit def doubleWritableConverter(): WritableConverter[Double] =
    simpleWritableConverter[Double, DoubleWritable](_.get)

  implicit def floatWritableConverter(): WritableConverter[Float] =
    simpleWritableConverter[Float, FloatWritable](_.get)

  implicit def booleanWritableConverter(): WritableConverter[Boolean] =
    simpleWritableConverter[Boolean, BooleanWritable](_.get)

  implicit def bytesWritableConverter(): WritableConverter[Array[Byte]] = {
    simpleWritableConverter[Array[Byte], BytesWritable] { bw =>
      // getBytes method returns array which is longer then data to be returned
      Arrays.copyOfRange(bw.getBytes, 0, bw.getLength)
    }
  }

  implicit def stringWritableConverter(): WritableConverter[String] =
    simpleWritableConverter[String, Text](_.toString)

  implicit def writableWritableConverter[T <: Writable](): WritableConverter[T] =
    new WritableConverter[T](_.runtimeClass.asInstanceOf[Class[T]], _.asInstanceOf[T])
}