Scala 如何使用listOfData和schema创建spark数据帧
我试图从数据列表中创建一个数据帧,并希望对其应用模式。 从Spark Scala文档中,我尝试使用这个createDataframe签名,它接受行列表和模式作为StructTypeScala 如何使用listOfData和schema创建spark数据帧,scala,dataframe,apache-spark,Scala,Dataframe,Apache Spark,我试图从数据列表中创建一个数据帧,并希望对其应用模式。 从Spark Scala文档中,我尝试使用这个createDataframe签名,它接受行列表和模式作为StructType def createDataFrame(行:List[Row],模式:StructType):DataFrame 下面是我正在尝试的示例代码 import org.apache.spark.sql.types._ import org.apache.spark.sql.Row val simpleData = Lis
def createDataFrame(行:List[Row],模式:StructType):DataFrame
下面是我正在尝试的示例代码
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val simpleData = List(Row("James", "Sales", 3000),
Row("Michael", "Sales", 4600),
Row("Robert", "Sales", 4100),
Row("Maria", "Finance", 3000)
)
val schema = StructType(Array(
StructField("name",StringType,false),
StructField("department",StringType,false),
StructField("salary",IntegerType,false)))
val df = spark.createDataFrame(simpleData,schema)
但我的错误率越来越低
command-3391230614683259:15: error: overloaded method value createDataFrame with alternatives:
(data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
(rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
(rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
cannot be applied to (List[org.apache.spark.sql.Row], org.apache.spark.sql.types.StructType)
val df = spark.createDataFrame(simpleData,schema)
command-3391230614683259:15:错误:重载了方法值createDataFrame和可选项:
(数据:java.util.List[\ux],beanClass:Class[\ux])org.apache.spark.sql.DataFrame
(rdd:org.apache.spark.api.java.JavaRDD[_],beanClass:Class[_])org.apache.spark.sql.DataFrame
(rdd:org.apache.spark.rdd.rdd[\u0],beanClass:Class[\u0])org.apache.spark.sql.DataFrame
(行:java.util.List[org.apache.spark.sql.Row],模式:org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
(rowRDD:org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema:org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
(rowRDD:org.apache.spark.rdd.rdd[org.apache.spark.sql.Row],schema:org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
无法应用于(列表[org.apache.spark.sql.Row],org.apache.spark.sql.types.StructType)
val df=spark.createDataFrame(simpleData,模式)
请说明我做错了什么。错误告诉您它需要Java列表而不是Scala列表:
import scala.jdk.CollectionConverters._
val df = spark.createDataFrame(simpleData.asJava, schema)
如果您使用的是早于2.13的Scala版本,请参阅以获取CollectionConverters的替代方案
另一个选项是传递RDD:
val df = spark.createDataFram(sc.parallelize(simpleData), schema)
sc
是SparkContext对象。错误告诉您它需要Java列表而不是Scala列表:
import scala.jdk.CollectionConverters._
val df = spark.createDataFrame(simpleData.asJava, schema)
如果您使用的是早于2.13的Scala版本,请参阅以获取CollectionConverters的替代方案
另一个选项是传递RDD:
val df = spark.createDataFram(sc.parallelize(simpleData), schema)
sc
是SparkContext对象