在pyspark中运行sql查询时获取pyspark.sql.utils.ParseException
我正在pyspark上运行SQL查询,并得到以下错误 你能帮帮我吗在pyspark中运行sql查询时获取pyspark.sql.utils.ParseException,pyspark,pyspark-sql,Pyspark,Pyspark Sql,我正在pyspark上运行SQL查询,并得到以下错误 你能帮帮我吗 query = "select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source
query = "select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2 UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2 UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2 UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2 UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod"
df = sqlContext.sql(query)
错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/context.py", line 353, in sql
return self.sparkSession.sql(sqlQuery)
File "/usr/lib/spark/python/pyspark/sql/session.py", line 710, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 73, in deco
raise ParseException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.ParseException: u"\nmismatched input 'from' expecting <EOF>(line 1, pos 133)
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
sql中的文件“/usr/lib/spark/python/pyspark/sql/context.py”,第353行
返回self.sparkSession.sql(sqlQuery)
文件“/usr/lib/spark/python/pyspark/sql/session.py”,第710行,sql格式
返回数据帧(self.\u jsparkSession.sql(sqlQuery),self.\u包装)
文件“/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,第1257行,在__
文件“/usr/lib/spark/python/pyspark/sql/utils.py”,第73行,deco格式
引发ParseException(s.split(“:”,1)[1],stackTrace)
pyspark.sql.utils.ParseException:u“\n预期为“来自”的匹配输入(第1行,位置133)
您的查询中缺少几个右括号“)”请查看一下。请重新格式化您在此处发布的查询。有很多(在线)这些工具可以为您做到这一点。重新格式化后,我可以看到以下问题:1.您几乎没有任何右括号)
2.您在子查询中不断重复调用相同的表和视图。Spark对子查询不是很好,所以也许可以考虑展平您的代码。3.您不断嵌套、区分和窗口化e相同(或相似)的代码片段。虽然钨丝非常擅长优化您的Spark代码,但请尝试并思考为什么您需要多次区分和窗口。