Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 火花连接变换相等函数_Scala_Apache Spark_Join - Fatal编程技术网

Scala 火花连接变换相等函数

Scala 火花连接变换相等函数,scala,apache-spark,join,Scala,Apache Spark,Join,我有两个数据集,如果一列的元素包含另一列的元素,我希望合并这些表。 我该怎么办 val df = df1.join(df2, df1.col("Complete Name").equalTo(df2.col("Name"))) 进入 那么: Dataset<Row> d1 = datasetFromJsonStrings(listOf("{\n" + " \"key\": \"name\",\n" + " \"origin\": \"left\"\n"

我有两个数据集,如果一列的元素包含另一列的元素,我希望合并这些表。 我该怎么办

val df = df1.join(df2, 
    df1.col("Complete Name").equalTo(df2.col("Name")))
进入

那么:

Dataset<Row> d1 = datasetFromJsonStrings(listOf("{\n" +
    "  \"key\": \"name\",\n" +
    "  \"origin\": \"left\"\n" +
"}"));

Dataset<Row> d2 = datasetFromJsonStrings(listOf("{\n" +
    "  \"key\": \"complete name\",\n" +
    "  \"origin\": \"right\"\n" +
"}"));

// [name,left,complete name,right]
List<Row> rows = d1.join(d2, d2.col("key").contains(d1.col("key"))).collectAsList();
Dataset d1=DataSetFromJSonString(listOf(“{\n”)+
“\”键\“:\”名称\“,\n”+
“\”来源\“:\”左侧\“\n”+
"}"));
Dataset d2=DataSetFromJsonString(listOf(“{\n”)+
“\”键\“:\”全名\“,\n”+
“\”来源\“:\”正确\“\n”+
"}"));
//[姓名,左,全名,右]
列表行=d1.join(d2,d2.col(“key”).contains(d1.col(“key”)).collectAsList();
注意:为了方便起见,我用Java编写,因为我的整个代码库都是用Java编写的,而不是Scala。

怎么样:

Dataset<Row> d1 = datasetFromJsonStrings(listOf("{\n" +
    "  \"key\": \"name\",\n" +
    "  \"origin\": \"left\"\n" +
"}"));

Dataset<Row> d2 = datasetFromJsonStrings(listOf("{\n" +
    "  \"key\": \"complete name\",\n" +
    "  \"origin\": \"right\"\n" +
"}"));

// [name,left,complete name,right]
List<Row> rows = d1.join(d2, d2.col("key").contains(d1.col("key"))).collectAsList();
Dataset d1=DataSetFromJSonString(listOf(“{\n”)+
“\”键\“:\”名称\“,\n”+
“\”来源\“:\”左侧\“\n”+
"}"));
Dataset d2=DataSetFromJsonString(listOf(“{\n”)+
“\”键\“:\”全名\“,\n”+
“\”来源\“:\”正确\“\n”+
"}"));
//[姓名,左,全名,右]
列表行=d1.join(d2,d2.col(“key”).contains(d1.col(“key”)).collectAsList();

注意:为了方便起见,我用Java编写了它,因为我的整个代码库都是用Java编写的,而不是Scala。

如果您这样做会怎么样

{
df1.join(df2, df1.col("Complete Name").ifContain(df2.col("Name")), "left_anti)
.union(df2.join(df1, df1.col("Complete Name").ifContain(df2.col("Name")), "left_anti))
}

但是没有测试它。

如果您这样做会怎么样

{
df1.join(df2, df1.col("Complete Name").ifContain(df2.col("Name")), "left_anti)
.union(df2.join(df1, df1.col("Complete Name").ifContain(df2.col("Name")), "left_anti))
}
但我没有测试它