Apache spark spark如何将mongo数据读取为json字符串?不使用模式
因为我需要与所有模式兼容的数据,所以默认值是samplesize 10000。如果我打开它,它将消耗大量的性能和时间。我想将数据转换成一个完整的json而不需要任何模式。有没有简单的方法可以做到这一点?谢谢Apache spark spark如何将mongo数据读取为json字符串?不使用模式,apache-spark,Apache Spark,因为我需要与所有模式兼容的数据,所以默认值是samplesize 10000。如果我打开它,它将消耗大量的性能和时间。我想将数据转换成一个完整的json而不需要任何模式。有没有简单的方法可以做到这一点?谢谢 import com.cd.flow.core.utils.ResourcesUtils import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession object TestMogonToS
import com.cd.flow.core.utils.ResourcesUtils
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
object TestMogonToString {
Logger.getRootLogger.setLevel(Level.WARN)
def main(args: Array[String]): Unit = {
val m6PropValues = ResourcesUtils.getMogodbPropValues("ds6uri", "ds6database")
val spark = SparkSession.builder.master("local[*]").appName(this.getClass.getSimpleName) .getOrCreate()
import spark.implicits._
val m_sql = "[{$match:{'updateTime':{'$gte':'2020-01-08 00:00:00','$lte':'2020-01-08 23:59:59'}}}]"
println("m_sql", m_sql)
val lxjStoreMongoDF = spark.read.format("com.mongodb.spark.sql.DefaultSource")
.option("spark.mongodb.input.uri", m6PropValues._1).option("spark.mongodb.input.database", m6PropValues._2)
.option("collection", "order").option("pipeline", m_sql)/*.schema(structureSchema)*/.load()
lxjStoreMongoDF.show()
lxjStoreMongoDF.printSchema()
/**
* +--------------------+-------------+--------+---------+---------------+----------+----------------+......
* | _id|activityPrice| brandId|brandName|cancelBigReason|cancelNote|cancelOperribute|......
* +--------------------+-------------+--------+---------+---------------+----------+----------------+......
* |[5e1471dd666bb700...| 0|26000252| 奈雪の茶| 0| | null fals......
*---------------------------------------------------------------split
* root
* |-- _id: struct (nullable = true)
* | |-- oid: string (nullable = true)
* |-- orderDrivers: array (nullable = true)
* | |-- element: struct (containsNull = true)
* | | |-- name: string (nullable = true)
* | | |-- phone: string (nullable = true)
* |-- orderId: integer (nullable = true)
* |-- orderNo: string (nullable = true)
* |-- orderPreferentials: array (nullable = true)
* | |-- element: struct (containsNull = true)
* | | |-- childType: integer (nullable = true)
* | | |-- pid: string (nullable = true)
* ..........
*/
//I want that
/**
* +--------------------+---------
* | jsonstring |....
* +--------------------+---------
* |{_id:100,activityPrice:xx.....}|....
*---------------------------------------------------------------split
* root
* |-- jsonstring: string (nullable = true)
* ..........
*/
}
}
这是有效的,但是它也有向中间的图式推进的样本。我想跳过samplesize配置,直接转到原始的完整字符串。我不知道有没有这样的办法
val jsonDF = lxjStoreMongoDF.toJSON.withColumnRenamed("value", "msg").withColumn("cd_ods_src", lit("m6"))
jsonDF.printSchema()
jsonDF.show()
root
|-- msg: string (nullable = true)
|-- cd_ods_src: string (nullable = false)
+--------------------+----------+
| msg|cd_ods_src|
+--------------------+----------+
|{"_id":{"oid":"5e...| m6|
|{"_id":{"oid":"5e...| m6|
|{"_id":{"oid":"5e...| m6|