Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 火花阅读数据集错误且怪异_Scala_Apache Spark_Apache Spark Sql_Dataset - Fatal编程技术网

Scala 火花阅读数据集错误且怪异

Scala 火花阅读数据集错误且怪异,scala,apache-spark,apache-spark-sql,dataset,Scala,Apache Spark,Apache Spark Sql,Dataset,在从S3读取文件时,我遇到了一个奇怪的问题。这就是我正在做的 val previousDay = spark.read .option("header", "false") .schema(schema) .csv(loadPath) .cache() 这是模式 StructType( Array( StructField("location_id", DataTyp

在从S3读取文件时,我遇到了一个奇怪的问题。这就是我正在做的

val previousDay = spark.read
      .option("header", "false")
      .schema(schema)
      .csv(loadPath)
      .cache()
这是模式

StructType(
    Array(
      StructField("location_id", DataTypes.StringType, nullable = true),
      StructField("uuid", DataTypes.StringType, nullable = true),
      StructField("country_code", DataTypes.StringType, nullable = true),
      StructField("shard", DataTypes.StringType, nullable = true),
      StructField("has_activity", DataTypes.StringType, nullable = true)
    )
  )
这就是csv的工作原理

"location_id","uuid","country_code","shard","has_activity"
"35fb2f0XX","06d0XX","FRA","eu","t"
"9ee98XX","7cd3c7XX","DEU","eu",""
"9d193XX","128abXX","ITA","eu",""
然而,当我在前一天做一个节目时,这就是我得到的

--------------------+--------------------+------------+
| lid.       |    uid |country     |activity    |shard|
+--------------------+--------------------+------------
|location_id |   uuid |country_code|       shard|   eu|
|35fb2f0XX   |6d0XX   |         FRA|          eu|   eu|
|9ee98XX     |7cd3c7XX|         DEU|          eu|   eu|
|9d193XX.    |128abXX |         ITA|          eu|   eu|
如图所示,碎片值在两列之间被复制,活动完全消失

我不知道发生了什么事。
我将非常感谢您在此

上的任何输入。架构定义是否应该包含
StructType
?更新为包含
StructType
,我无法复制您的输出。无论如何,输出数据帧没有意义。列名应来自架构。为什么它要命名第一列
lid
?就是这样。我不知道发生了什么事,你缺少信息,任何人都无法肯定地回答这个问题;很可能您的csv已分区,其中一个文件的头已损坏