Apache spark sql SparkSQL:相同的查询返回不同的结果
我遇到了一个奇怪的问题。我想从一个数据框中获取所有数据,并将其插入到一个永久的配置单元表中,并将其索引到elasticsearch。查询很简单,即Apache spark sql SparkSQL:相同的查询返回不同的结果,apache-spark-sql,Apache Spark Sql,我遇到了一个奇怪的问题。我想从一个数据框中获取所有数据,并将其插入到一个永久的配置单元表中,并将其索引到elasticsearch。查询很简单,即select*from result*,我将循环遍历每一行并插入ES。简单的插入select*from result,但我得到了不同的结果。为了检查,我创建了3个不同的临时表,如下所示 spark.sql("select * from QtyContribution").join(getRevenueContribution(spark,table2)
select*from result*
,我将循环遍历每一行并插入ES。简单的插入select*from result
,但我得到了不同的结果。为了检查,我创建了3个不同的临时表,如下所示
spark.sql("select * from QtyContribution").join(getRevenueContribution(spark,table2), "item").join(finalUniqueItem(spark), "item").registerTempTable("hola");
spark.sql("select * from QtyContribution").join(getRevenueContribution(spark,table2), "item").join(finalUniqueItem(spark), "item").registerTempTable("hola1");
spark.sql("select * from QtyContribution").join(getRevenueContribution(spark,table2), "item").join(finalUniqueItem(spark), "item").registerTempTable("hola2");
每个查询都是相同的,只是表不同。及
Dataset<Row> dframe1 = spark.sql("select * from hola");
Row[] row1 = (Row[]) dframe1.collect();
int q=1;
for(Row s : row1){
System.out.println(s.get(0)+" =======df1======= "+ q++);
}
Dataset<Row> dframe2 = spark.sql("select * from hola1");
Row[] row2 = (Row[]) dframe2.collect();
int w=1;
for(Row s : row2){
System.out.println(s.get(0)+" =======df2======= "+ w++);
}
Dataset<Row> dframe3 = spark.sql("select * from hola2");
Row[] row3 = (Row[]) dframe3.collect();
int e=1;
for(Row s : row2){
System.out.println(s.get(0)+" =======df3======= "+ e++);
}
我做了些什么
Dataset<Row> dframe = spark.sql("select * from hola1");
Row[] row = (Row[]) dframe.collect();
int i = 1;
for (Row r : row) {
bulkRequest.add(client.prepareIndex("twitter1234", "use1", String.valueOf(i))
.setSource(jsonBuilder()
.startObject()
.field("item", r.get(0))
.field("qty_contrib", r.get(1))
.field("division", r.get(2))
.field("rev_contrib", r.get(3))
.field("bp", r.get(4))
.endObject()
)
);
System.out.println(i++ +" ==== "+r.get(0));
}
发生了什么事
Dataset<Row> dframe = spark.sql("select * from hola1");
Row[] row = (Row[]) dframe.collect();
int i = 1;
for (Row r : row) {
bulkRequest.add(client.prepareIndex("twitter1234", "use1", String.valueOf(i))
.setSource(jsonBuilder()
.startObject()
.field("item", r.get(0))
.field("qty_contrib", r.get(1))
.field("division", r.get(2))
.field("rev_contrib", r.get(3))
.field("bp", r.get(4))
.endObject()
)
);
System.out.println(i++ +" ==== "+r.get(0));
}
1534 ==== BM8942