Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/meteor/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何使用listOfData和schema创建spark数据帧_Scala_Dataframe_Apache Spark - Fatal编程技术网

Scala 如何使用listOfData和schema创建spark数据帧

Scala 如何使用listOfData和schema创建spark数据帧,scala,dataframe,apache-spark,Scala,Dataframe,Apache Spark,我试图从数据列表中创建一个数据帧,并希望对其应用模式。 从Spark Scala文档中,我尝试使用这个createDataframe签名,它接受行列表和模式作为StructType def createDataFrame(行:List[Row],模式:StructType):DataFrame 下面是我正在尝试的示例代码 import org.apache.spark.sql.types._ import org.apache.spark.sql.Row val simpleData = Lis

我试图从数据列表中创建一个数据帧,并希望对其应用模式。 从Spark Scala文档中,我尝试使用这个createDataframe签名,它接受行列表和模式作为StructType

def createDataFrame(行:List[Row],模式:StructType):DataFrame

下面是我正在尝试的示例代码

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val simpleData = List(Row("James", "Sales", 3000),
  Row("Michael", "Sales", 4600),
  Row("Robert", "Sales", 4100),
  Row("Maria", "Finance", 3000)
)

val schema = StructType(Array(
StructField("name",StringType,false),
StructField("department",StringType,false),
StructField("salary",IntegerType,false)))


val df = spark.createDataFrame(simpleData,schema)
但我的错误率越来越低

command-3391230614683259:15: error: overloaded method value createDataFrame with alternatives:
  (data: java.util.List[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rdd: org.apache.spark.rdd.RDD[_],beanClass: Class[_])org.apache.spark.sql.DataFrame <and>
  (rows: java.util.List[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
  (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
 cannot be applied to (List[org.apache.spark.sql.Row], org.apache.spark.sql.types.StructType)
val df = spark.createDataFrame(simpleData,schema)
command-3391230614683259:15:错误:重载了方法值createDataFrame和可选项:
(数据:java.util.List[\ux],beanClass:Class[\ux])org.apache.spark.sql.DataFrame
(rdd:org.apache.spark.api.java.JavaRDD[_],beanClass:Class[_])org.apache.spark.sql.DataFrame
(rdd:org.apache.spark.rdd.rdd[\u0],beanClass:Class[\u0])org.apache.spark.sql.DataFrame
(行:java.util.List[org.apache.spark.sql.Row],模式:org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
(rowRDD:org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema:org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
(rowRDD:org.apache.spark.rdd.rdd[org.apache.spark.sql.Row],schema:org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
无法应用于(列表[org.apache.spark.sql.Row],org.apache.spark.sql.types.StructType)
val df=spark.createDataFrame(simpleData,模式)

请说明我做错了什么。

错误告诉您它需要Java列表而不是Scala列表:

import scala.jdk.CollectionConverters._

val df = spark.createDataFrame(simpleData.asJava, schema)
如果您使用的是早于2.13的Scala版本,请参阅以获取CollectionConverters的替代方案

另一个选项是传递RDD:

val df = spark.createDataFram(sc.parallelize(simpleData), schema)

sc
是SparkContext对象。

错误告诉您它需要Java列表而不是Scala列表:

import scala.jdk.CollectionConverters._

val df = spark.createDataFrame(simpleData.asJava, schema)
如果您使用的是早于2.13的Scala版本,请参阅以获取CollectionConverters的替代方案

另一个选项是传递RDD:

val df = spark.createDataFram(sc.parallelize(simpleData), schema)
sc
是SparkContext对象