Scala/Spark:当行由double类型的字段组成时,如何打印数据集[行]的内容
我在Scala中有一个模型类,如:Scala/Spark:当行由double类型的字段组成时,如何打印数据集[行]的内容,scala,apache-spark,Scala,Apache Spark,我在Scala中有一个模型类,如: package examples.partnerModels import com.fasterxml.jackson.annotation.JsonProperty case class Temparature (@JsonProperty YEAR: Double, @JsonProperty MONTH: Double, @JsonProperty DA
package examples.partnerModels
import com.fasterxml.jackson.annotation.JsonProperty
case class Temparature (@JsonProperty YEAR: Double,
@JsonProperty MONTH: Double,
@JsonProperty DAY : Double,
@JsonProperty MAX_TEMP: Double,
@JsonProperty MIN_TEMP : Double
)
{
def this() = this(0,0,0,0,0)
def getDataFields(): List[Double] =
{
productIterator.asInstanceOf[Iterator[Double]].toList
}
}
object Temparature {
def apply() = new Temparature(0,0,0,0,0)
}
我已经用这个模型创建了一个数据框,其中包含了试探性真实和排序记录,并试图以这种方式打印每个记录的内容:
val dataRecordsTemp = sc.textFile(tempFile).map{rec=>
val splittedRec = rec.split("\\s+")
Temparature(
if(isEmpty(splittedRec(0))) 0 else splittedRec(0).toDouble,
if(isEmpty(splittedRec(1))) 0 else splittedRec(1).toDouble,
if(isEmpty(splittedRec(2))) 0 else splittedRec(2).toDouble,
if(isEmpty(splittedRec(3))) 0 else splittedRec(3).toDouble,
if(isEmpty(splittedRec(4))) 0 else splittedRec(4).toDouble
)
}.map{x => Row.fromSeq(x.getDataFields())}
val headerFieldsForTemp = Seq("YEAR","MONTH","DAY","MAX_TEMP","MIN_TEMP")
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, StringType, nullable=true)})
val dfTemp = session.createDataFrame(dataRecordsTemp,schemaTemp)
.orderBy(desc("year"), desc("month"), desc("day"))
println("Printing temparature data ...............................")
dfTemp.show(20)
但是,我在尝试打印的行上出现错误:
java.lang.Double is not a valid external type for schema of string
如何打印具有Double类型字段行的数据帧的内容
?用java.lang.Double.parseDouble(splittedRec(i))代替splittedRec(i)
要打印具有double类型字段行的dataframe内容,structfields应为double类型
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})
使用java.lang.Double.parseDouble(splittedRec(i))代替splittedRec(i).toDouble
要打印具有double类型字段行的dataframe内容,structfields应为double类型
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})
模式将列的类型设置为string,但将其值设置为nullable double(即java.lang.double)。考虑将StudiaEMP的定义更改为:
模式将列的类型设置为string,但将其值设置为nullable double(即java.lang.double)。考虑将StudiaEMP的定义更改为: