Apache spark spark如何将mongo数据读取为json字符串？不使用模式_Apache Spark

Apache spark spark如何将mongo数据读取为json字符串？不使用模式

apache-spark

Apache spark spark如何将mongo数据读取为json字符串？不使用模式,apache-spark,Apache Spark,因为我需要与所有模式兼容的数据，所以默认值是samplesize 10000。如果我打开它，它将消耗大量的性能和时间。我想将数据转换成一个完整的json而不需要任何模式。有没有简单的方法可以做到这一点？谢谢 import com.cd.flow.core.utils.ResourcesUtils import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession object TestMogonToS

因为我需要与所有模式兼容的数据，所以默认值是samplesize 10000。如果我打开它，它将消耗大量的性能和时间。我想将数据转换成一个完整的json而不需要任何模式。有没有简单的方法可以做到这一点？谢谢

import com.cd.flow.core.utils.ResourcesUtils
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

object TestMogonToString {
  Logger.getRootLogger.setLevel(Level.WARN)
  def main(args: Array[String]): Unit = {
    val m6PropValues = ResourcesUtils.getMogodbPropValues("ds6uri", "ds6database")
    val spark = SparkSession.builder.master("local[*]").appName(this.getClass.getSimpleName) .getOrCreate()
    import spark.implicits._
    val m_sql = "[{$match:{'updateTime':{'$gte':'2020-01-08 00:00:00','$lte':'2020-01-08 23:59:59'}}}]"
    println("m_sql", m_sql)
    val lxjStoreMongoDF = spark.read.format("com.mongodb.spark.sql.DefaultSource")
      .option("spark.mongodb.input.uri", m6PropValues._1).option("spark.mongodb.input.database", m6PropValues._2)
      .option("collection", "order").option("pipeline", m_sql)/*.schema(structureSchema)*/.load()
    lxjStoreMongoDF.show()
    lxjStoreMongoDF.printSchema()
    /**
      * +--------------------+-------------+--------+---------+---------------+----------+----------------+......
      * |                 _id|activityPrice| brandId|brandName|cancelBigReason|cancelNote|cancelOperribute|......
      * +--------------------+-------------+--------+---------+---------------+----------+----------------+......
      * |[5e1471dd666bb700...|            0|26000252| 奈雪の茶|              0|          |          null   fals......
      *---------------------------------------------------------------split
      * root
      * |-- _id: struct (nullable = true)
      * |    |-- oid: string (nullable = true)
      * |-- orderDrivers: array (nullable = true)
      * |    |-- element: struct (containsNull = true)
      * |    |    |-- name: string (nullable = true)
      * |    |    |-- phone: string (nullable = true)
      * |-- orderId: integer (nullable = true)
      * |-- orderNo: string (nullable = true)
      * |-- orderPreferentials: array (nullable = true)
      * |    |-- element: struct (containsNull = true)
      * |    |    |-- childType: integer (nullable = true)
      * |    |    |-- pid: string (nullable = true)
      * ..........
      */
    //I want that
    /**
      * +--------------------+---------
      * |   jsonstring       |....
      * +--------------------+---------
      * |{_id:100,activityPrice:xx.....}|....
      *---------------------------------------------------------------split
      * root
      * |-- jsonstring: string (nullable = true)
      * ..........
      */
  }
}

这是有效的，但是它也有向中间的图式推进的样本。我想跳过samplesize配置，直接转到原始的完整字符串。我不知道有没有这样的办法

    val jsonDF = lxjStoreMongoDF.toJSON.withColumnRenamed("value", "msg").withColumn("cd_ods_src", lit("m6"))
    jsonDF.printSchema()
    jsonDF.show()

root
 |-- msg: string (nullable = true)
 |-- cd_ods_src: string (nullable = false)

+--------------------+----------+
|                 msg|cd_ods_src|
+--------------------+----------+
|{"_id":{"oid":"5e...|        m6|
|{"_id":{"oid":"5e...|        m6|
|{"_id":{"oid":"5e...|        m6|