Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用模式避免Spark SQL中的双精度转换_Apache Spark_Apache Spark Sql_Spark Dataframe - Fatal编程技术网

Apache spark 使用模式避免Spark SQL中的双精度转换

Apache spark 使用模式避免Spark SQL中的双精度转换,apache-spark,apache-spark-sql,spark-dataframe,Apache Spark,Apache Spark Sql,Spark Dataframe,我有一个简单的JSON,如下所示,值节点有时有STRING,有时有DOUBLE。我希望将值视为字符串。但是,当spark看到该标记是它的两倍时,它将使用E转换为不同的格式 输入JSON {"key" : "k1", "value": "86093351508521808.0"} {"key" : "k2", "value": 86093351508521808.0} 火花输出CSV k1,86093351508521808.0 k2,8.6093351508521808E16 预期产量 k1

我有一个简单的JSON,如下所示,值节点有时有STRING,有时有DOUBLE。我希望将值视为字符串。但是,当spark看到该标记是它的两倍时,它将使用E转换为不同的格式

输入JSON

{"key" : "k1", "value": "86093351508521808.0"}
{"key" : "k2", "value": 86093351508521808.0}
火花输出CSV

k1,86093351508521808.0
k2,8.6093351508521808E16
预期产量

k1,86093351508521808.0
k2,86093351508521808.0
请告知如何实现预期的产出。我们从不读取标记中的值,因此我们永远不会知道精度和其他细节

下面是示例代码

public static void main(String[] args) {
    SparkSession sparkSession = SparkSession
        .builder()
        .appName(TestSpark.class.getName())
        .master("local[*]").getOrCreate();

    SparkContext context = sparkSession.sparkContext();
    context.setLogLevel("ERROR");
    SQLContext sqlCtx = sparkSession.sqlContext();
    System.out.println("Spark context established");

    List<StructField> kvFields = new ArrayList<>();
    kvFields.add(DataTypes.createStructField("key", DataTypes.StringType, true));
    kvFields.add(DataTypes.createStructField("value", DataTypes.StringType, true));
    StructType employeeSchema = DataTypes.createStructType(kvFields);

    Dataset<Row> dataset = sparkSession.read()
        .option("inferSchema", false)
        .format("json")
        .schema(employeeSchema)
        .load("D:\\dev\\workspace\\java\\simple-kafka\\key_value.json");
    dataset.createOrReplaceTempView("sourceView");
    sqlCtx.sql("select * from sourceView  ")
        .write()
        .format("csv")
        .save("D:\\dev\\workspace\\java\\simple-kafka\\output\\" + UUID.randomUUID().toString());

    sparkSession.close();

}我们可以将该列转换为DecimalType,如下所示:

scala> import org.apache.spark.sql.types.DecimalType;
import org.apache.spark.sql.types.DecimalType

scala> spark.read.json(sc.parallelize(Seq("""{"key" : "k1", "value": "86093351508521808.0"}""","""{"key" : "k2", "value": 86093351508521808.0}"""))).select(col("value").cast(DecimalType(28, 1))).show

+-------------------+
|              value|
+-------------------+
|86093351508521808.0|
|86093351508521808.0|
+-------------------+

创建case类和创建case类Personkey:String和value:String和map值。正如我在问题中提到的,我们从不读取标记中的值,因此我们永远不会知道精度和其他细节。