Scala 在ApacheSpark中使用RESTful API并转换为数据帧

Scala 在ApacheSpark中使用RESTful API并转换为数据帧,scala,rest,apache-spark,dataframe,Scala,Rest,Apache Spark,Dataframe,我试图通过以下方式将的输出直接从RESTful api转换为数据帧转换: package trials import org.apache.spark.sql.SparkSession import org.json4s.jackson.JsonMethods.parse import scala.io.Source.fromURL object DEF { implicit val formats = org.json4s.DefaultFormats case class Res

我试图通过以下方式将的输出直接从RESTful api转换为数据帧转换:

package trials

import org.apache.spark.sql.SparkSession
import org.json4s.jackson.JsonMethods.parse
import scala.io.Source.fromURL

object DEF {
  implicit val formats = org.json4s.DefaultFormats
  case class Result(success: Boolean,
                    message: String,
                    result: Array[Markets])
  case class Markets(
                      MarketCurrency:String,
                      BaseCurrency:String,
                      MarketCurrencyLong:String,
                      BaseCurrencyLong:String,
                      MinTradeSize:Double,
                      MarketName:String,
                      IsActive:Boolean,
                      Created:String,
                      Notice:String,
                      IsSponsored:String,
                      LogoUrl:String
                    )

  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .appName(s"${this.getClass.getSimpleName}")
      .config("spark.sql.shuffle.partitions", "4")
      .master("local[*]")
      .getOrCreate()

    import spark.implicits._
    val parsedData = parse(fromURL("https://bittrex.com/api/v1.1/public/getmarkets").mkString).extract[Array[Result]]
    val mySourceDataset = spark.createDataset(parsedData)
    mySourceDataset.printSchema
    mySourceDataset.show()
  }

}
错误如下所示,并对每条记录重复:

由以下原因引起:org.json4s.package$MappingException:预期收集但获得了JObject(列表((成功,JBool(true)),(消息,JString()),(结果,JArray(列表(列表((MarketCurrency,JString(LTC)),(BaseCurrencyLong,JString(Litecoin)),(BaseCurrencyLong,JString(比特币)),(MinTradeSize,JDouble(0.01435906)),(MarketName,JString(BTC-LTC)),(IsActive,JBool(true)),(Created,JString(2014-02-13T00:00:00)),(Notice,JNull),(Isponsored,JNull),(LogoUrl,JString());))和映射结果[][结果,结果]
在org.json4s.reflect.package$.fail(package.scala:96)

上,从这个URL返回的JSON结构是:

{
  "success": boolean,
  "message": string,
  "result": [ ... ]
}
因此,
Result
类应与此结构保持一致:

case class Result(success: Boolean,
                  message: String,
                  result: List[Markets])
更新 我还稍微改进了
市场
类:

case class Markets(
                    MarketCurrency: String,
                    BaseCurrency: String,
                    MarketCurrencyLong: String,
                    BaseCurrencyLong: String,
                    MinTradeSize: Double,
                    MarketName: String,
                    IsActive: Boolean,
                    Created: String,
                    Notice: Option[String],
                    IsSponsored: Option[Boolean],
                    LogoUrl: String
                  )
更新结束

但主要问题在于从解析的JSON中提取主要数据部分:

val parsedData = parse(fromURL("{url}").mkString).extract[Array[Result]]
返回结构的根不是数组,而是对应于
Result
。因此,它应该是:

val parsedData = parse(fromURL("{url}").mkString).extract[Result]
然后,我假设您不需要在数据帧中加载包装器,而是加载数据帧中的
市场。这就是为什么它应该像这样加载:

val mySourceDataset = spark.createDataset(parsedData.result)
它最终生成数据帧:

+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
|MarketCurrency|BaseCurrency|MarketCurrencyLong|BaseCurrencyLong|MinTradeSize|MarketName|IsActive|            Created|Notice|IsSponsored|             LogoUrl|
+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
|           LTC|         BTC|          Litecoin|         Bitcoin|  0.01435906|   BTC-LTC|    true|2014-02-13T00:00:00|  null|       null|https://bittrexbl...|
|          DOGE|         BTC|          Dogecoin|         Bitcoin|396.82539683|  BTC-DOGE|    true|2014-02-13T00:00:00|  null|       null|https://bittrexbl...|

我收到以下错误:
由以下原因引起:
能否请您更新在上述case类中使用的映射类型。的确,我没有在
市场
中添加一些小的变化,认为它们没有那么大的影响。更新了答案。