Apache spark 使用dataframe列'；s性质法_Apache Spark_Pyspark_Apache Spark Sql

Apache spark 使用dataframe列'；s性质法

apache-spark pyspark

Apache spark 使用dataframe列'；s性质法,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,我想加入两个数据帧。DFs的架构如下所示： itemsDF.printSchema() root |-- asin: string (nullable = true) |-- brand: string (nullable = true) |-- title: string (nullable = true) |-- url: string (nullable = true) |-- image: string (nullable = true) |-- rating: float

我想加入两个数据帧。DFs的架构如下所示：

itemsDF.printSchema()

root
 |-- asin: string (nullable = true)
 |-- brand: string (nullable = true)
 |-- title: string (nullable = true)
 |-- url: string (nullable = true)
 |-- image: string (nullable = true)
 |-- rating: float (nullable = true)
 |-- reviewUrl: string (nullable = true)
 |-- totalReviews: integer (nullable = true)

reviewsDF.printSchema()

root
 |-- asin: string (nullable = true)
 |-- name: string (nullable = true)
 |-- rating: float (nullable = true)
 |-- date: date (nullable = true)
 |-- verified: boolean (nullable = true)
 |-- title: string (nullable = true)
 |-- helpfulVotes: float (nullable = true)

我想通过列

asin

连接这两个数据帧。此表达式似乎可以正常工作：

reviewsDF.join(itemsDF, reviewsDF['asin'] == itemsDF['asin']).show()

但是，以下表达式给出了一个错误：

reviewsDF.join(itemsDF, reviewsDF.asin == itemsDF.asin).show()

为什么第二个表达式失败？

您的ReviewDF和itemsDF是否来自同一数据帧？@Jaydeep否它们不是来自同一数据帧。

AnalysisException: 'Detected implicit cartesian product for INNER join between logical plans\nProject [asin#347, name#348, cast(rating#349 as float) AS rating#459, cast(cast(unix_timestamp(date#350, MMMM dd, yyyy, Some(America/Los_Angeles)) as timestamp) as date) AS date#458, cast(verified#351 as boolean) AS verified#460, title#352, cast(helpfulVotes#354 as float) AS helpfulVotes#461]\n+- Relation[asin#347,name#348,rating#349,date#350,verified#351,title#352,body#353,helpfulVotes#354] csv\nand\nProject [asin#846, brand#847, title#848, url#849, image#850, cast(rating#851 as float) AS rating#864, reviewUrl#852, cast(totalReviews#853 as int) AS totalReviews#865]\n+- Filter (isnotnull(asin#846) && (asin#846 = asin#846))\n   +- Relation[asin#846,brand#847,title#848,url#849,image#850,rating#851,reviewUrl#852,totalReviews#853,prices#854] csv\nJoin condition is missing or trivial.\nEither: use the CROSS JOIN syntax to allow cartesian products between these\nrelations, or: enable implicit cartesian products by setting the configuration\nvariable spark.sql.crossJoin.enabled=true;'