使用Spark Scala读取REST API JSON响应
我想通过应用数据帧中的一些参数来实现API,获取Json响应主体,并从主体中提取特定键的所有不同值。 然后我需要将此列添加到第一个数据帧中 假设我有一个如下所示的数据帧:使用Spark Scala读取REST API JSON响应,json,scala,api,rest,apache-spark,Json,Scala,Api,Rest,Apache Spark,我想通过应用数据帧中的一些参数来实现API,获取Json响应主体,并从主体中提取特定键的所有不同值。 然后我需要将此列添加到第一个数据帧中 假设我有一个如下所示的数据帧: df1: +-----+-------+--------+ | DB | User | UserID | +-----+-------+--------+ | db1 | user1 | 123 | | db2 | user2 | 456 | +-----+-------+--------+ 我想通过提
df1:
+-----+-------+--------+
| DB | User | UserID |
+-----+-------+--------+
| db1 | user1 | 123 |
| db2 | user2 | 456 |
+-----+-------+--------+
我想通过提供Df1
的列值作为参数来实现RESTAPI
如果我的URL参数是db=db1
和User=user1
(df1的第一条记录),则响应将是以下格式的json格式:
{
"data":[
{
"db": "db1"
"User": "User1"
"UserID": 123
"Query": "Select * from A"
"Application": "App1"
},
{
"db": "db1"
"User": "User1"
"UserID": 123
"Query": "Select * from B"
"Application": "App2"
}
]
}
从这个json文件中,我希望获得Application
键的不同值作为数组或列表,并将其作为新列附加到Df1
我的输出如下所示:
Final df:
+-----+-------+--------+-------------+
| DB | User | UserID | Apps |
+-----+-------+--------+-------------+
| db1 | user1 | 123 | {App1,App2} |
| db2 | user2 | 456 | {App3,App3} |
+-----+-------+--------+-------------+
我已经就如何实现这一目标提出了一个高水平的计划
我正在使用spark 1.6。检查下面的代码,您可能需要编写逻辑来调用重置api。一旦得到结果,下一个过程就简单了
scala> val df = Seq(("db1","user1",123),("db2","user2",456)).toDF("db","user","userid")
df: org.apache.spark.sql.DataFrame = [db: string, user: string, userid: int]
scala> df.show(false)
+---+-----+------+
|db |user |userid|
+---+-----+------+
|db1|user1|123 |
|db2|user2|456 |
+---+-----+------+
scala> :paste
// Entering paste mode (ctrl-D to finish)
def invokeRestAPI(db:String,user: String) = {
import org.json4s._
import org.json4s.jackson.JsonMethods._
implicit val formats = DefaultFormats
// Write your invoke logic & for now I am hardcoding your sample json here.
val json_data = parse("""{"data":[ {"db": "db1","User": "User1","UserID": 123,"Query": "Select * from A","Application": "App1"},{"db": "db1","User": "User1","UserID": 123,"Query": "Select * from B","Application": "App2"}]}""")
(json_data \\ "data" \ "Application").extract[Set[String]].toList
}
// Exiting paste mode, now interpreting.
invokeRestAPI: (db: String, user: String)List[String]
scala> val fetch = udf(invokeRestAPI _)
fetch: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function2>,ArrayType(StringType,true),List(StringType, StringType))
scala> df.withColumn("apps",fetch($"db",$"user")).show(false)
+---+-----+------+------------+
|db |user |userid|apps |
+---+-----+------+------------+
|db1|user1|123 |[App1, App2]|
|db2|user2|456 |[App1, App2]|
+---+-----+------+------------+
scala>val df=Seq((“db1”,“user1”,123),(“db2”,“user2”,456)).toDF(“db”,“user”,“userid”)
df:org.apache.spark.sql.DataFrame=[db:string,user:string,userid:int]
scala>df.show(假)
+---+-----+------+
|db |用户|用户ID|
+---+-----+------+
|db1 | user1 | 123|
|db2 | user2 | 456|
+---+-----+------+
scala>:粘贴
//进入粘贴模式(按ctrl-D键完成)
def invokerestatpi(db:String,user:String)={
导入org.json4s_
导入org.json4s.jackson.JsonMethods_
隐式val格式=默认格式
//编写您的调用逻辑&现在我在这里硬编码您的示例json。
val json_data=parse(“{”data):[{“db”:“db1”,“User”:“User1”,“UserID”:123,“Query”:“Select*from A”,“Application”:“App1”},{“db”:“db1”,“User”:“User1”,“UserID”:123,“Query”:“Select*from B”,“Application”:“App2”}]}”
(json_data\\“data”\“Application”).extract[Set[String].toList
}
//正在退出粘贴模式,现在正在解释。
invokerestatpi:(db:String,user:String)列表[String]
scala>val fetch=udf(invokeStatpi)
fetch:org.apache.spark.sql.UserDefinedFunction=UserDefinedFunction(,ArrayType(StringType,true),List(StringType,StringType))
scala>df.withColumn(“应用程序”,获取($“db”,$“用户”)).show(false)
+---+-----+------+------------+
|db |用户|用户ID |应用程序|
+---+-----+------+------------+
|db1 | user1 | 123 |[App1,App2]|
|db2 | user2 | 456 |[App1,App2]|
+---+-----+------+------------+
以下是解决方案帮助吗?@Srinivas该解决方案看起来很完美。谢谢!。我还没有试过……访问API的问题……如果您遇到任何问题,我一试就会更新如果有帮助,请投票或接受:)它不工作吗??