Apache spark 使用dataframe列';s性质法
我想加入两个数据帧。DFs的架构如下所示:Apache spark 使用dataframe列';s性质法,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,我想加入两个数据帧。DFs的架构如下所示: itemsDF.printSchema() root |-- asin: string (nullable = true) |-- brand: string (nullable = true) |-- title: string (nullable = true) |-- url: string (nullable = true) |-- image: string (nullable = true) |-- rating: float
itemsDF.printSchema()
root
|-- asin: string (nullable = true)
|-- brand: string (nullable = true)
|-- title: string (nullable = true)
|-- url: string (nullable = true)
|-- image: string (nullable = true)
|-- rating: float (nullable = true)
|-- reviewUrl: string (nullable = true)
|-- totalReviews: integer (nullable = true)
reviewsDF.printSchema()
root
|-- asin: string (nullable = true)
|-- name: string (nullable = true)
|-- rating: float (nullable = true)
|-- date: date (nullable = true)
|-- verified: boolean (nullable = true)
|-- title: string (nullable = true)
|-- helpfulVotes: float (nullable = true)
我想通过列asin
连接这两个数据帧。此表达式似乎可以正常工作:
reviewsDF.join(itemsDF, reviewsDF['asin'] == itemsDF['asin']).show()
但是,以下表达式给出了一个错误:
reviewsDF.join(itemsDF, reviewsDF.asin == itemsDF.asin).show()
为什么第二个表达式失败?您的ReviewDF和itemsDF是否来自同一数据帧?@Jaydeep否它们不是来自同一数据帧。
AnalysisException: 'Detected implicit cartesian product for INNER join between logical plans\nProject [asin#347, name#348, cast(rating#349 as float) AS rating#459, cast(cast(unix_timestamp(date#350, MMMM dd, yyyy, Some(America/Los_Angeles)) as timestamp) as date) AS date#458, cast(verified#351 as boolean) AS verified#460, title#352, cast(helpfulVotes#354 as float) AS helpfulVotes#461]\n+- Relation[asin#347,name#348,rating#349,date#350,verified#351,title#352,body#353,helpfulVotes#354] csv\nand\nProject [asin#846, brand#847, title#848, url#849, image#850, cast(rating#851 as float) AS rating#864, reviewUrl#852, cast(totalReviews#853 as int) AS totalReviews#865]\n+- Filter (isnotnull(asin#846) && (asin#846 = asin#846))\n +- Relation[asin#846,brand#847,title#848,url#849,image#850,rating#851,reviewUrl#852,totalReviews#853,prices#854] csv\nJoin condition is missing or trivial.\nEither: use the CROSS JOIN syntax to allow cartesian products between these\nrelations, or: enable implicit cartesian products by setting the configuration\nvariable spark.sql.crossJoin.enabled=true;'