Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用dataframe列';s性质法_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Apache spark 使用dataframe列';s性质法

Apache spark 使用dataframe列';s性质法,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,我想加入两个数据帧。DFs的架构如下所示: itemsDF.printSchema() root |-- asin: string (nullable = true) |-- brand: string (nullable = true) |-- title: string (nullable = true) |-- url: string (nullable = true) |-- image: string (nullable = true) |-- rating: float

我想加入两个数据帧。DFs的架构如下所示:

itemsDF.printSchema()

root
 |-- asin: string (nullable = true)
 |-- brand: string (nullable = true)
 |-- title: string (nullable = true)
 |-- url: string (nullable = true)
 |-- image: string (nullable = true)
 |-- rating: float (nullable = true)
 |-- reviewUrl: string (nullable = true)
 |-- totalReviews: integer (nullable = true)

reviewsDF.printSchema()

root
 |-- asin: string (nullable = true)
 |-- name: string (nullable = true)
 |-- rating: float (nullable = true)
 |-- date: date (nullable = true)
 |-- verified: boolean (nullable = true)
 |-- title: string (nullable = true)
 |-- helpfulVotes: float (nullable = true)
我想通过列
asin
连接这两个数据帧。此表达式似乎可以正常工作:

reviewsDF.join(itemsDF, reviewsDF['asin'] == itemsDF['asin']).show()
但是,以下表达式给出了一个错误:

reviewsDF.join(itemsDF, reviewsDF.asin == itemsDF.asin).show()

为什么第二个表达式失败?

您的ReviewDF和itemsDF是否来自同一数据帧?@Jaydeep否它们不是来自同一数据帧。
AnalysisException: 'Detected implicit cartesian product for INNER join between logical plans\nProject [asin#347, name#348, cast(rating#349 as float) AS rating#459, cast(cast(unix_timestamp(date#350, MMMM dd, yyyy, Some(America/Los_Angeles)) as timestamp) as date) AS date#458, cast(verified#351 as boolean) AS verified#460, title#352, cast(helpfulVotes#354 as float) AS helpfulVotes#461]\n+- Relation[asin#347,name#348,rating#349,date#350,verified#351,title#352,body#353,helpfulVotes#354] csv\nand\nProject [asin#846, brand#847, title#848, url#849, image#850, cast(rating#851 as float) AS rating#864, reviewUrl#852, cast(totalReviews#853 as int) AS totalReviews#865]\n+- Filter (isnotnull(asin#846) && (asin#846 = asin#846))\n   +- Relation[asin#846,brand#847,title#848,url#849,image#850,rating#851,reviewUrl#852,totalReviews#853,prices#854] csv\nJoin condition is missing or trivial.\nEither: use the CROSS JOIN syntax to allow cartesian products between these\nrelations, or: enable implicit cartesian products by setting the configuration\nvariable spark.sql.crossJoin.enabled=true;'