Python sparksql将变量传递到查询
我已经找遍了这个问题的答案,并尝试了一切。似乎什么都不管用。我试图在python中的spark.sql查询中引用变量赋值。运行Python3和spark版本2.3.1Python sparksql将变量传递到查询,python,pyspark,Python,Pyspark,我已经找遍了这个问题的答案,并尝试了一切。似乎什么都不管用。我试图在python中的spark.sql查询中引用变量赋值。运行Python3和spark版本2.3.1 bkt = 1 prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\ FROM pwrcrv_tmp\ where EXT
bkt = 1
prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
FROM pwrcrv_tmp\
where EXTR_CURR_NUM_CYC_DLQ=$bkt\
and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
group by ((year(fdr_date))*100)+month(fdr_date)\
order by ((year(fdr_date))*100)+month(fdr_date)")
prime.show(50)
错误:
prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts FROM pwrcrv_tmp where EXTR_CURR_NUM_CYC_DLQ=$bkt and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA') group by ((year(fdr_date))*100)+month(fdr_date) order by ((year(fdr_date))*100)+month(fdr_date)")
^
SyntaxError: invalid syntax
我在这篇databricks文章中找到了正确的语法 在查询前面添加小写的f,并在查询中变量的名称周围加上大括号
bkt = 1
prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
FROM pwrcrv_tmp\
where EXTR_CURR_NUM_CYC_DLQ={bkt}\
and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
group by ((year(fdr_date))*100)+month(fdr_date)\
order by ((year(fdr_date))*100)+month(fdr_date)")
prime.show(50)
bkt=1 prime=spark.sql(“选择((年(fdr_日期))*100)+月(fdr_日期)作为fdr_年,计数(*)作为pwrcrv_tmp中的计数,其中EXTR_CURR_NUM_CYC_DLQ=“%bkt%”和EXTR_ACCOUNT输入('PS','PT','PD','PC','HV','PA')\分组依据((年(fdr_日期))+100个月)\按((年(fdr_日期))*100)+月(fdr_日期)”)prime排序。显示(50)这是一个问题吗?也不知道为什么在评论中发布了更多代码。请也阅读。首先,
s“…”
是一个语法错误-这是什么意思?其次,尝试使用$bkt
格式化字符串是无效的python语法。查找我的帖子标题是我的问题。我从这个答案中得到了s“…”,这个答案在stackoverflow上标记为正确。@email83我不知道那是什么语言,但您要找的答案是: