Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark数据帧联合提供副本_Scala_Apache Spark - Fatal编程技术网

Scala Spark数据帧联合提供副本

Scala Spark数据帧联合提供副本,scala,apache-spark,Scala,Apache Spark,我有一个基本数据集,其中一列有null和notnull值。 因此,我: 当我把它打印出来时,我得到了清晰的各行。但当我这么做的时候 val combined_ds = nonTrained_ds.union(trained_ds) 我从nonTrained\u-ds中获得了重复的行记录,奇怪的是,来自trained\u-ds的行不再在组合的ds中 为什么会发生这种情况 trained_ds的值为: +----------+----------------+ |unique_no |

我有一个基本数据集,其中一列有null和notnull值。 因此,我:

当我把它打印出来时,我得到了清晰的各行。但当我这么做的时候

val combined_ds = nonTrained_ds.union(trained_ds)
我从
nonTrained\u-ds
中获得了重复的行记录,奇怪的是,来自
trained\u-ds
的行不再在组合的ds中

为什么会发生这种情况

trained_ds
的值为:

+----------+----------------+
|unique_no |      running_id|
+----------+----------------+
|0456700001|16              |
|0456700004|16              |
|0456700007|16              |
|0456700010|16              |
|0456700013|16              |
|0456700016|16              |
|0456700019|16              |
|0456700022|16              |
|0456700025|16              |
|0456700028|16              |
|0456700031|16              |
|0456700034|16              |
|0456700037|16              |
|0456700040|16              |
|0456700043|16              |
|0456700046|16              |
|0456700049|16              |
|0456700052|16              |
|0456700055|16              |
|0456700058|16              |
|0456700061|16              |
|0456700064|16              |
|0456700067|16              |
|0456700070|16              |
+----------+----------------+
+----------+----------------+
|unique_no |      running_id|
+----------+----------------+
|0456700002|null            |
|0456700003|null            |
|0456700005|null            |
|0456700006|null            |
|0456700008|null            |
|0456700009|null            |
|0456700011|null            |
|0456700012|null            |
|0456700014|null            |
|0456700015|null            |
|0456700017|null            |
|0456700018|null            |
|0456700020|null            |
|0456700021|null            |
|0456700023|null            |
|0456700024|null            |
|0456700026|null            |
|0456700027|null            |
|0456700029|null            |
|0456700030|null            |
|0456700032|null            |
|0456700033|null            |
|0456700035|null            |
|0456700036|null            |
|0456700038|null            |
|0456700039|null            |
|0456700041|null            |
|0456700042|null            |
|0456700044|null            |
|0456700045|null            |
|0456700047|null            |
|0456700048|null            |
|0456700050|null            |
|0456700051|null            |
|0456700053|null            |
|0456700054|null            |
|0456700056|null            |
|0456700057|null            |
|0456700059|null            |
|0456700060|null            |
|0456700062|null            |
|0456700063|null            |
|0456700065|null            |
|0456700066|null            |
|0456700068|null            |
|0456700069|null            |
|0456700071|null            |
|0456700072|null            |
+----------+----------------+
nonTrained_ds
的值为:

+----------+----------------+
|unique_no |      running_id|
+----------+----------------+
|0456700001|16              |
|0456700004|16              |
|0456700007|16              |
|0456700010|16              |
|0456700013|16              |
|0456700016|16              |
|0456700019|16              |
|0456700022|16              |
|0456700025|16              |
|0456700028|16              |
|0456700031|16              |
|0456700034|16              |
|0456700037|16              |
|0456700040|16              |
|0456700043|16              |
|0456700046|16              |
|0456700049|16              |
|0456700052|16              |
|0456700055|16              |
|0456700058|16              |
|0456700061|16              |
|0456700064|16              |
|0456700067|16              |
|0456700070|16              |
+----------+----------------+
+----------+----------------+
|unique_no |      running_id|
+----------+----------------+
|0456700002|null            |
|0456700003|null            |
|0456700005|null            |
|0456700006|null            |
|0456700008|null            |
|0456700009|null            |
|0456700011|null            |
|0456700012|null            |
|0456700014|null            |
|0456700015|null            |
|0456700017|null            |
|0456700018|null            |
|0456700020|null            |
|0456700021|null            |
|0456700023|null            |
|0456700024|null            |
|0456700026|null            |
|0456700027|null            |
|0456700029|null            |
|0456700030|null            |
|0456700032|null            |
|0456700033|null            |
|0456700035|null            |
|0456700036|null            |
|0456700038|null            |
|0456700039|null            |
|0456700041|null            |
|0456700042|null            |
|0456700044|null            |
|0456700045|null            |
|0456700047|null            |
|0456700048|null            |
|0456700050|null            |
|0456700051|null            |
|0456700053|null            |
|0456700054|null            |
|0456700056|null            |
|0456700057|null            |
|0456700059|null            |
|0456700060|null            |
|0456700062|null            |
|0456700063|null            |
|0456700065|null            |
|0456700066|null            |
|0456700068|null            |
|0456700069|null            |
|0456700071|null            |
|0456700072|null            |
+----------+----------------+
组合ds的值为:

+----------+----------------+
|unique_no |      running_id|
+----------+----------------+
|0456700002|null            |
|0456700003|null            |
|0456700005|null            |
|0456700006|null            |
|0456700008|null            |
|0456700009|null            |
|0456700011|null            |
|0456700012|null            |
|0456700014|null            |
|0456700015|null            |
|0456700017|null            |
|0456700018|null            |
|0456700020|null            |
|0456700021|null            |
|0456700023|null            |
|0456700024|null            |
|0456700026|null            |
|0456700027|null            |
|0456700029|null            |
|0456700030|null            |
|0456700032|null            |
|0456700033|null            |
|0456700035|null            |
|0456700036|null            |
|0456700038|null            |
|0456700039|null            |
|0456700041|null            |
|0456700042|null            |
|0456700044|null            |
|0456700045|null            |
|0456700047|null            |
|0456700048|null            |
|0456700050|null            |
|0456700051|null            |
|0456700053|null            |
|0456700054|null            |
|0456700056|null            |
|0456700057|null            |
|0456700059|null            |
|0456700060|null            |
|0456700062|null            |
|0456700063|null            |
|0456700065|null            |
|0456700066|null            |
|0456700068|null            |
|0456700069|null            |
|0456700071|null            |
|0456700072|null            |
|0456700002|16              |
|0456700005|16              |
|0456700008|16              |
|0456700011|16              |
|0456700014|16              |
|0456700017|16              |
|0456700020|16              |
|0456700023|16              |
|0456700026|16              |
|0456700029|16              |
|0456700032|16              |
|0456700035|16              |
|0456700038|16              |
|0456700041|16              |
|0456700044|16              |
|0456700047|16              |
|0456700050|16              |
|0456700053|16              |
|0456700056|16              |
|0456700059|16              |
|0456700062|16              |
|0456700065|16              |
|0456700068|16              |
|0456700071|16              |
+----------+----------------+
这就成功了

val nonTrained_ds = base_ds.filter(col("primary_offer_id").isNull).distinct()
    val trained_ds = base_ds.filter(col("primary_offer_id").isNotNull).distinct()

可以显示重复的行和未合并的行吗?我猜测的是,您没有可视化整个数据集。如果查看组合数据集输出,值“0456700002”出现两次。不确定是否有人在问题上标记了“-1”-如果其不明确,请要求澄清。如果查看组合数据集,所有值为“16”的行都是“非训练数据”中第一列的重复项。这些是输入数据帧中唯一的数据吗?在受过训练和未受过训练的情况下?还是更多?有很多其他的栏目,但中间没有任何修改。我只是将数据集“A”提取到两个数据集中,然后将它们连接起来,得到数据集“A”。我正在使用Spark 2.0