Json Spark Scala-嵌套案例类的检查字段
我有三个案例类,如下所示:Json Spark Scala-嵌套案例类的检查字段,json,scala,apache-spark,Json,Scala,Apache Spark,我有三个案例类,如下所示: case class Result( result: Seq[Signal], hop: Int) case class Signal( rtt: Double, from: String) case class Traceroute( dst_name: String, from: String, prb_id: BigInt, msm_id: BigInt, timestamp: Bi
case class Result(
result: Seq[Signal],
hop: Int)
case class Signal(
rtt: Double,
from: String)
case class Traceroute(
dst_name: String,
from: String,
prb_id: BigInt,
msm_id: BigInt,
timestamp: BigInt,
result: Seq[Result])
def checkSignal(signal: Signal): Signal = {
if (signal.rtt > 0) {
return signal
} else {
return null
}
}
跟踪路由具有字段结果,这是一个结果序列。每个结果都是一个信号序列
我尝试检查结果字段是否为非负值。
我的json记录如下:
{"prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.57", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}]}
{"timestamp": 1514768409, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}]}
{"timestamp": 1514768402, "result": [{"result": [{"rtt": 19.955, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 2}]}
为了清楚起见,我在json记录中添加了一些属性。result属性是Traceroute case类中的结果字段
我使用了一个滤波器来检查rtt in信号是否为note负,但我没有得到预期的结果
val checkrtts = checkError.filter(x => x.result.foreach(p => p.result.foreach(f => checkSignal(f))))
检查信号功能如下所示:
case class Result(
result: Seq[Signal],
hop: Int)
case class Signal(
rtt: Double,
from: String)
case class Traceroute(
dst_name: String,
from: String,
prb_id: BigInt,
msm_id: BigInt,
timestamp: BigInt,
result: Seq[Result])
def checkSignal(signal: Signal): Signal = {
if (signal.rtt > 0) {
return signal
} else {
return null
}
}
给出两个跟踪路由实例的示例:
{"timestamp": 1514768409, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}]}
{"timestamp": 1514768402, "result": [{"result": [{"rtt": -2.5, "ttl": 255, "from": "89.105.200.57", "size": 28},{"rtt": 19.955, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 2}]}
对于第一个跟踪路由,不应用任何更改。
对于第二个Traceroute,result.result字段具有两个元素类型的信号,第一个信号具有负rtt,因此我应该从result.result中删除此信号。但不应删除第二个信号
因此,输出应如下所示:
{"prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.57", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}]}
{"timestamp": 1514768409, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}]}
{"timestamp": 1514768402, "result": [{"result": [{"rtt": 19.955, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 2}]}
请帮忙。我是spark和scala的新手。我尝试了很多方法,但结果并不像预期的那样。对于过滤函数应该做什么,您似乎有一点误解。它从返回false的数据集中过滤整个Traceroute对象。您需要做的是编写一个映射函数,该函数将把原始的Traceroute对象转换为所需的对象。下面是如何为Dataset[Traceroute]执行此操作的示例
val checkrtts = checkError.filter(x => x.result.foreach(p => p.result.foreach(f => checkSignal(f))))
首先,您需要稍微修改case类,如下所示
case class Result(var result: Seq[Signal],
hop: Int)
case class Signal(rtt: Double,
from: String)
case class Traceroute( dst_name: String,
from: String,
prb_id: BigInt,
msm_id: BigInt,
timestamp: BigInt,
result: Seq[Result])
如您所见,我已将var添加到result类的result字段中。这将有助于我们稍后在自定义函数中修改结果字段,我们将把它传递给映射操作
然后定义以下两个函数,如下所示:
def checkSignal(signal: Signal): Boolean = {
if (signal.rtt > 0) {
return true
} else {
return false
}
}
def removeNegative(traceroute: Traceroute): Traceroute = {
val outerList = traceroute.result
for( temp <- outerList){
val innerList = temp.result
//here we are filtering the list to only contain nonnegative elements
val newinnerList = innerList.filter(checkSignal(_))
//here we are reassigning the newlist to result
temp.result = newinnerList
}
traceroute
}
输出结果如下:
Showing 10 rows of original dataset
+--------+----+------+------+----------+-------------------------------------------------------+
|dst_name|from|prb_id|msm_id|timestamp |result |
+--------+----+------+------+----------+-------------------------------------------------------+
|null |null|null |null |1514768409|[[[[1.955, 89.105.200.57]], 1]] |
|null |null|null |null |1514768402|[[[[-2.5, 89.105.200.57], [19.955, 89.105.200.57]], 2]]|
+--------+----+------+------+----------+-------------------------------------------------------+
Showing 10 rows of transformed dataset
+--------+----+------+------+----------+--------------------------------+
|dst_name|from|prb_id|msm_id|timestamp |result |
+--------+----+------+------+----------+--------------------------------+
|null |null|null |null |1514768409|[[[[1.955, 89.105.200.57]], 1]] |
|null |null|null |null |1514768402|[[[[19.955, 89.105.200.57]], 2]]|
+--------+----+------+------+----------+--------------------------------+
你能补充一下你得到了什么和你的期望是什么吗?如果需要,添加更多数据行以正确显示输出。