Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/objective-c/26.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在pyspark中运行sql查询时获取pyspark.sql.utils.ParseException_Pyspark_Pyspark Sql - Fatal编程技术网

在pyspark中运行sql查询时获取pyspark.sql.utils.ParseException

在pyspark中运行sql查询时获取pyspark.sql.utils.ParseException,pyspark,pyspark-sql,Pyspark,Pyspark Sql,我正在pyspark上运行SQL查询,并得到以下错误 你能帮帮我吗 query = "select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source

我正在pyspark上运行SQL查询,并得到以下错误

你能帮帮我吗

query = "select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID,  id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2  UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries  WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID,  id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2  UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries  WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID,  id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2  UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries  WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID,  id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2  UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries  WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod"

df = sqlContext.sql(query)
错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/spark/python/pyspark/sql/context.py", line 353, in sql
    return self.sparkSession.sql(sqlQuery)
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 710, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 73, in deco
    raise ParseException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.ParseException: u"\nmismatched input 'from' expecting <EOF>(line 1, pos 133)
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
sql中的文件“/usr/lib/spark/python/pyspark/sql/context.py”,第353行
返回self.sparkSession.sql(sqlQuery)
文件“/usr/lib/spark/python/pyspark/sql/session.py”,第710行,sql格式
返回数据帧(self.\u jsparkSession.sql(sqlQuery),self.\u包装)
文件“/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,第1257行,在__
文件“/usr/lib/spark/python/pyspark/sql/utils.py”,第73行,deco格式
引发ParseException(s.split(“:”,1)[1],stackTrace)
pyspark.sql.utils.ParseException:u“\n预期为“来自”的匹配输入(第1行,位置133)

您的查询中缺少几个右括号“)”请查看一下。

请重新格式化您在此处发布的查询。有很多(在线)这些工具可以为您做到这一点。重新格式化后,我可以看到以下问题:1.您几乎没有任何右括号
2.您在子查询中不断重复调用相同的表和视图。Spark对子查询不是很好,所以也许可以考虑展平您的代码。3.您不断嵌套、区分和窗口化e相同(或相似)的代码片段。虽然钨丝非常擅长优化您的Spark代码,但请尝试并思考为什么您需要多次
区分
窗口。