如何在两个数字之间列出数字java spark datafrme
数据帧输入: 请求数据帧输出 如果输入文件有冒号,我们需要找到一个冒号 前任: 1:10->1,2,3,4,5,6,7,8,9,10 这可能与正则表达式有关。我不清楚,请找人帮忙如何在两个数字之间列出数字java spark datafrme,java,regex,dataframe,apache-spark,apache-spark-sql,Java,Regex,Dataframe,Apache Spark,Apache Spark Sql,数据帧输入: 请求数据帧输出 如果输入文件有冒号,我们需要找到一个冒号 前任: 1:10->1,2,3,4,5,6,7,8,9,10 这可能与正则表达式有关。我不清楚,请找人帮忙 UDF: def myudf2=(input:String)=>{ val regex = "('\\d+':'\\d+')".r val out = new ListBuffer[String]() input.replaceAll("'", "").split(",").map(x
UDF:
def myudf2=(input:String)=>{
val regex = "('\\d+':'\\d+')".r
val out = new ListBuffer[String]()
input.replaceAll("'", "").split(",").map(x=>{
if(x.matches("(\\d+:\\d+)")){
val colon = x.split(":")
out += (colon(0).toInt to colon(1).toInt).mkString(", ")
} else {
out += x
}
})
out.mkString(",").replaceAll("\\[|\\]", "")
}
val df = Seq((1,"'1':'5','6','7':'10'"),(2,"'1':'6','7','8':'12'")).toDF("id","number")
scala> df.show
+---+--------------------+
| id| number|
+---+--------------------+
| 1|'1':'5','6','7':'10'|
| 2|'1':'6','7','8':'12'|
+---+--------------------+
val myCostumeudf = udf(myudf2)
scala> val outDF = df.withColumn("output", myCostumeudf(df("number")))
scala> outDF.show(5,false)
+---+--------------------+---------------------------------------+
|id |number |output |
+---+--------------------+---------------------------------------+
|1 |'1':'5','6','7':'10'|1, 2, 3, 4, 5, 6, 7, 8, 9, 10 |
|2 |'1':'6','7','8':'12'|1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 |
+---+--------------------+---------------------------------------+
请尝试上述操作。请提供您已经尝试过的一些代码,否则您的问题将不会得到任何答案/帮助