Scala 在ApacheSpark中使用RESTful API并转换为数据帧
我试图通过以下方式将的输出直接从RESTful api转换为数据帧转换:Scala 在ApacheSpark中使用RESTful API并转换为数据帧,scala,rest,apache-spark,dataframe,Scala,Rest,Apache Spark,Dataframe,我试图通过以下方式将的输出直接从RESTful api转换为数据帧转换: package trials import org.apache.spark.sql.SparkSession import org.json4s.jackson.JsonMethods.parse import scala.io.Source.fromURL object DEF { implicit val formats = org.json4s.DefaultFormats case class Res
package trials
import org.apache.spark.sql.SparkSession
import org.json4s.jackson.JsonMethods.parse
import scala.io.Source.fromURL
object DEF {
implicit val formats = org.json4s.DefaultFormats
case class Result(success: Boolean,
message: String,
result: Array[Markets])
case class Markets(
MarketCurrency:String,
BaseCurrency:String,
MarketCurrencyLong:String,
BaseCurrencyLong:String,
MinTradeSize:Double,
MarketName:String,
IsActive:Boolean,
Created:String,
Notice:String,
IsSponsored:String,
LogoUrl:String
)
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName(s"${this.getClass.getSimpleName}")
.config("spark.sql.shuffle.partitions", "4")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val parsedData = parse(fromURL("https://bittrex.com/api/v1.1/public/getmarkets").mkString).extract[Array[Result]]
val mySourceDataset = spark.createDataset(parsedData)
mySourceDataset.printSchema
mySourceDataset.show()
}
}
错误如下所示,并对每条记录重复:
由以下原因引起:org.json4s.package$MappingException:预期收集但获得了JObject(列表((成功,JBool(true)),(消息,JString()),(结果,JArray(列表(列表((MarketCurrency,JString(LTC)),(BaseCurrencyLong,JString(Litecoin)),(BaseCurrencyLong,JString(比特币)),(MinTradeSize,JDouble(0.01435906)),(MarketName,JString(BTC-LTC)),(IsActive,JBool(true)),(Created,JString(2014-02-13T00:00:00)),(Notice,JNull),(Isponsored,JNull),(LogoUrl,JString());))和映射结果[][结果,结果]
在org.json4s.reflect.package$.fail(package.scala:96)上,从这个URL返回的JSON结构是:
{
"success": boolean,
"message": string,
"result": [ ... ]
}
因此,Result
类应与此结构保持一致:
case class Result(success: Boolean,
message: String,
result: List[Markets])
更新
我还稍微改进了市场
类:
case class Markets(
MarketCurrency: String,
BaseCurrency: String,
MarketCurrencyLong: String,
BaseCurrencyLong: String,
MinTradeSize: Double,
MarketName: String,
IsActive: Boolean,
Created: String,
Notice: Option[String],
IsSponsored: Option[Boolean],
LogoUrl: String
)
更新结束
但主要问题在于从解析的JSON中提取主要数据部分:
val parsedData = parse(fromURL("{url}").mkString).extract[Array[Result]]
返回结构的根不是数组,而是对应于Result
。因此,它应该是:
val parsedData = parse(fromURL("{url}").mkString).extract[Result]
然后,我假设您不需要在数据帧中加载包装器,而是加载数据帧中的市场。这就是为什么它应该像这样加载:
val mySourceDataset = spark.createDataset(parsedData.result)
它最终生成数据帧:
+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
|MarketCurrency|BaseCurrency|MarketCurrencyLong|BaseCurrencyLong|MinTradeSize|MarketName|IsActive| Created|Notice|IsSponsored| LogoUrl|
+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
| LTC| BTC| Litecoin| Bitcoin| 0.01435906| BTC-LTC| true|2014-02-13T00:00:00| null| null|https://bittrexbl...|
| DOGE| BTC| Dogecoin| Bitcoin|396.82539683| BTC-DOGE| true|2014-02-13T00:00:00| null| null|https://bittrexbl...|
我收到以下错误:由以下原因引起:
能否请您更新在上述case类中使用的映射类型。的确,我没有在市场
中添加一些小的变化,认为它们没有那么大的影响。更新了答案。