Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/unit-testing/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark数据帧联接显示意外结果-0行_Scala_Apache Spark_Join_Apache Spark Sql - Fatal编程技术网

Scala Spark数据帧联接显示意外结果-0行

Scala Spark数据帧联接显示意外结果-0行,scala,apache-spark,join,apache-spark-sql,Scala,Apache Spark,Join,Apache Spark Sql,我使用的是spark-1.6.0,我想加入2个数据帧,它们在下面的纱线日志中显示 df_列车_原始 df\u用户\u单击\u信息 +------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+------

我使用的是spark-1.6.0,我想加入2个数据帧,它们在下面的纱线日志中显示

df_列车_原始

df\u用户\u单击\u信息

+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+
|subscriberid|user_clicks_avg_everyday_a_week|user_clicks_sum_time_1_9_a_week|user_clicks_sum_time_9_14_a_week|user_clicks_sum_time_14_17_a_week|user_clicks_sum_time_17_19_a_week|user_clicks_sum_time_19_23_a_week|user_clicks_sum_time_23_1_a_week|user_clicks_avg_everyday_weekday|user_clicks_sum_time_1_9_weekday|user_clicks_sum_time_9_14_weekday|user_clicks_sum_time_14_17_weekday|user_clicks_sum_time_17_19_weekday|user_clicks_sum_time_19_23_weekday|user_clicks_sum_time_23_1_weekday|user_clicks_avg_everyday_weekdend|user_clicks_sum_time_1_9_weekdend|user_clicks_sum_time_9_14_weekdend|user_clicks_sum_time_14_17_weekdend|user_clicks_sum_time_17_19_weekdend|user_clicks_sum_time_19_23_weekdend|user_clicks_sum_time_23_1_weekdend|
+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+
|   104752237|                           1.71|                              0|                               0|                                0|                                4|                                4|                               4|                             0.8|                               0|                                0|                                 0|                                 0|                                 4|                                0|                              4.0|                                0|                                 0|                                  0|                                  4|                                  0|                                 4|
|   105517237|                          17.14|                             12|                              36|                               12|                                0|                               60|                               0|                             9.6|                               0|                                0|                                 0|                                 0|                                48|                                0|                             36.0|                               12|                                36|                                 12|                                  0|                                 12|                                 0|
|   109901037|                           2.14|                              0|                               3|                                3|                                6|                                3|                               0|                             2.4|                               0|                                0|                                 3|                                 6|                                 3|                                0|                              1.5|                                0|                                 3|                                  0|                                  0|                                  0|                                 0|
|   105246837|                            8.0|                              8|                               0|                                0|                               16|                               32|                               0|                             8.0|                               8|                                0|                                 0|                                 8|                                24|                                0|                              8.0|                                0|                                 0|                                  0|                                  8|                                  8|                                 0|
+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+

————————————

root
 |-- subscriberid: string (nullable = true)
 |-- user_clicks_avg_everyday_a_week: double (nullable = false)
 |-- user_clicks_sum_time_1_9_a_week: long (nullable = false)
 |-- user_clicks_sum_time_9_14_a_week: long (nullable = false)
 |-- user_clicks_sum_time_14_17_a_week: long (nullable = false)
 |-- user_clicks_sum_time_17_19_a_week: long (nullable = false)
 |-- user_clicks_sum_time_19_23_a_week: long (nullable = false)
 |-- user_clicks_sum_time_23_1_a_week: long (nullable = false)
 |-- user_clicks_avg_everyday_weekday: double (nullable = false)
 |-- user_clicks_sum_time_1_9_weekday: long (nullable = false)
 |-- user_clicks_sum_time_9_14_weekday: long (nullable = false)
 |-- user_clicks_sum_time_14_17_weekday: long (nullable = false)
 |-- user_clicks_sum_time_17_19_weekday: long (nullable = false)
 |-- user_clicks_sum_time_19_23_weekday: long (nullable = false)
 |-- user_clicks_sum_time_23_1_weekday: long (nullable = false)
 |-- user_clicks_avg_everyday_weekdend: double (nullable = false)
 |-- user_clicks_sum_time_1_9_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_9_14_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_14_17_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_17_19_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_19_23_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_23_1_weekdend: long (nullable = false)


df_user_clicks_info.select("subscriberid").take(20).foreach(println)


[104752237]
[105517237]
[109901037]
[105246837]
我已尝试使用代码将它们内部连接起来:

val-df_-tmp_-tmp_0=df_-train_-raw.join(df_-user_单击信息,序列(“订阅ID”))
df_tmp_tmp_0.show()
而我得到的结果却一文不值!天哪

+------------+--------+-----+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+
|subscriberid|objectid|label|subscriberid|user_clicks_avg_everyday_a_week|user_clicks_sum_time_1_9_a_week|user_clicks_sum_time_9_14_a_week|user_clicks_sum_time_14_17_a_week|user_clicks_sum_time_17_19_a_week|user_clicks_sum_time_19_23_a_week|user_clicks_sum_time_23_1_a_week|user_clicks_avg_everyday_weekday|user_clicks_sum_time_1_9_weekday|user_clicks_sum_time_9_14_weekday|user_clicks_sum_time_14_17_weekday|user_clicks_sum_time_17_19_weekday|user_clicks_sum_time_19_23_weekday|user_clicks_sum_time_23_1_weekday|user_clicks_avg_everyday_weekdend|user_clicks_sum_time_1_9_weekdend|user_clicks_sum_time_9_14_weekdend|user_clicks_sum_time_14_17_weekdend|user_clicks_sum_time_17_19_weekdend|user_clicks_sum_time_19_23_weekdend|user_clicks_sum_time_23_1_weekdend|
+------------+--------+-----+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+
+------------+--------+-----+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+
我不知道为什么?这里好像没什么问题?希望能得到一些帮助,谢谢~


在听了两位朋友关于空间的建议后,我想再试试:

df_train_raw
————————————

+------------+-----------+-----+
|subscriberid|   objectid|label|
+------------+-----------+-----+
|   104752237|11029932485|    0|
|   105246837|11029932485|    0|
|   105517237|11029932485|    0|
|   108917037|11030797988|    0|
|   108917037|11029648595|    0|
|   109901037|11029648595|    0|
|   105517237|11030720502|    0|
|   105246837|11029986502|    0|
|   104752237|11029191717|    0|
|   105246837|11029191717|    0|
|   105517237|11029191717|    0|
|   109901037|11030138623|    0|
|   105517237|11014105538|    0|
|   105517237|11014105543|    0|
|   105517237|11016478156|    0|
|   105517237|11023285357|    0|
|   105246837|11026067980|    0|
|   105246837|11030797988|    0|
|   108917037|11029932485|    0|
|   109901037|11029932485|    0|
+------------+-----------+-----+
only showing top 20 rows

————————————

root
 |-- subscriberid: long (nullable = true)
 |-- objectid: long (nullable = true)
 |-- label: integer (nullable = true)
并打印“subscriberid”列,这显示的不是空格

df_train_raw.select("subscriberid").take(20).foreach(println)
结果

[104752237]
[105246837]
[105517237]
[108917037]
[108917037]
[109901037]
[105517237]
[105246837]
[104752237]
[105246837]
[105517237]
[109901037]
[105517237]
[105517237]
[105517237]
[105517237]
[105246837]
[105246837]
[108917037]
[109901037]
然后,df_用户点击信息

+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+
|subscriberid|user_clicks_avg_everyday_a_week|user_clicks_sum_time_1_9_a_week|user_clicks_sum_time_9_14_a_week|user_clicks_sum_time_14_17_a_week|user_clicks_sum_time_17_19_a_week|user_clicks_sum_time_19_23_a_week|user_clicks_sum_time_23_1_a_week|user_clicks_avg_everyday_weekday|user_clicks_sum_time_1_9_weekday|user_clicks_sum_time_9_14_weekday|user_clicks_sum_time_14_17_weekday|user_clicks_sum_time_17_19_weekday|user_clicks_sum_time_19_23_weekday|user_clicks_sum_time_23_1_weekday|user_clicks_avg_everyday_weekdend|user_clicks_sum_time_1_9_weekdend|user_clicks_sum_time_9_14_weekdend|user_clicks_sum_time_14_17_weekdend|user_clicks_sum_time_17_19_weekdend|user_clicks_sum_time_19_23_weekdend|user_clicks_sum_time_23_1_weekdend|
+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+
|   104752237|                           1.71|                              0|                               0|                                0|                                4|                                4|                               4|                             0.8|                               0|                                0|                                 0|                                 0|                                 4|                                0|                              4.0|                                0|                                 0|                                  0|                                  4|                                  0|                                 4|
|   105517237|                          17.14|                             12|                              36|                               12|                                0|                               60|                               0|                             9.6|                               0|                                0|                                 0|                                 0|                                48|                                0|                             36.0|                               12|                                36|                                 12|                                  0|                                 12|                                 0|
|   109901037|                           2.14|                              0|                               3|                                3|                                6|                                3|                               0|                             2.4|                               0|                                0|                                 3|                                 6|                                 3|                                0|                              1.5|                                0|                                 3|                                  0|                                  0|                                  0|                                 0|
|   105246837|                            8.0|                              8|                               0|                                0|                               16|                               32|                               0|                             8.0|                               8|                                0|                                 0|                                 8|                                24|                                0|                              8.0|                                0|                                 0|                                  0|                                  8|                                  8|                                 0|
+------------+-------------------------------+-------------------------------+--------------------------------+---------------------------------+---------------------------------+---------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+----------------------------------+----------------------------------+----------------------------------+---------------------------------+---------------------------------+---------------------------------+----------------------------------+-----------------------------------+-----------------------------------+-----------------------------------+----------------------------------+

————————————

root
 |-- subscriberid: string (nullable = true)
 |-- user_clicks_avg_everyday_a_week: double (nullable = false)
 |-- user_clicks_sum_time_1_9_a_week: long (nullable = false)
 |-- user_clicks_sum_time_9_14_a_week: long (nullable = false)
 |-- user_clicks_sum_time_14_17_a_week: long (nullable = false)
 |-- user_clicks_sum_time_17_19_a_week: long (nullable = false)
 |-- user_clicks_sum_time_19_23_a_week: long (nullable = false)
 |-- user_clicks_sum_time_23_1_a_week: long (nullable = false)
 |-- user_clicks_avg_everyday_weekday: double (nullable = false)
 |-- user_clicks_sum_time_1_9_weekday: long (nullable = false)
 |-- user_clicks_sum_time_9_14_weekday: long (nullable = false)
 |-- user_clicks_sum_time_14_17_weekday: long (nullable = false)
 |-- user_clicks_sum_time_17_19_weekday: long (nullable = false)
 |-- user_clicks_sum_time_19_23_weekday: long (nullable = false)
 |-- user_clicks_sum_time_23_1_weekday: long (nullable = false)
 |-- user_clicks_avg_everyday_weekdend: double (nullable = false)
 |-- user_clicks_sum_time_1_9_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_9_14_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_14_17_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_17_19_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_19_23_weekdend: long (nullable = false)
 |-- user_clicks_sum_time_23_1_weekdend: long (nullable = false)


df_user_clicks_info.select("subscriberid").take(20).foreach(println)


[104752237]
[105517237]
[109901037]
[105246837]

它也不起作用:(

感谢帮助我的朋友们的帮助。原因是,我认为是SPARK-1.6.0中的一个缺陷,我通过更改数据流程而没有更新SPARK来解决它。我的意思是,一开始,我想从df_1和df_2获得df_3,但由于问题中提到的缺陷,它没有得到我想要的结果,所以我尝试了另一个获得df_tmp_1和df_tmp_2的方法,然后加入它们并得到结果。我也不知道为什么,但如果你使用SPARK-1.6.0并遇到像我一样的加入错误,这似乎是个好主意。

你能转换到bigint然后进行比较吗?我猜boh数据帧上的数据类型可能不同,并让我知道结果@SadamHussain M,谢谢您的建议~我已尝试将2个数据帧中的“subscriberid”强制转换为long,但无效~:(它们是字符串类型的列。请确保任何数据帧中的数字前后都没有空格。“162323641”将不等于“162323641”因此这些行不会加入。@Selnay谢谢你的建议~我检查了两个数据帧中用于加入的“subscriberid”列,我打印了它,没有空格,它不起作用。:(Try
val df_tmp_tmp_0=df_train_raw.join(df_user_单击信息,df_train_raw(“subscriberid”)==df_user_单击信息(“subscriberid”)