Pyspark 传递给sparkSQL的动态值

Pyspark 传递给sparkSQL的动态值,pyspark,apache-spark-sql,pyspark-sql,Pyspark,Apache Spark Sql,Pyspark Sql,基本上,我是在pyspark SQL中传递动态值。我的代码如下: set_sql = "".join(["set app_list_0 = 'app_3'"]) sqlContext.sql(set_sql) click_app_join_sql = sqlContext.sql("select click_id, (case when app_new in ${app_list_0} then 1 else 0 END) as ${app_list_0}, device,

基本上,我是在pyspark SQL中传递动态值。我的代码如下:

set_sql = "".join(["set app_list_0 = 'app_3'"])

    sqlContext.sql(set_sql)

    click_app_join_sql = sqlContext.sql("select click_id, (case when app_new in ${app_list_0} then 1 else 0 END) as  ${app_list_0}, device, os, channel from clickDF ")

    click_app = sqlContext.sql(click_app_join_sql)

    click_app.show(3)
当我运行代码时,我会出现以下错误。你能告诉我上面的代码出了什么问题吗

File "/home/saureddi/spark/data_process/hive_data_process.py", line 103, in <module>
    click_app_join_sql = sqlContext.sql("select click_id, (case when app_new in ${app_list_0} then 1 else 0 END) as  ${app_list_0}, device, os, channel from clickDF ")
  File "/usr/hdp/2.6.4.0-91/spark2/python/lib/pyspark.zip/pyspark/sql/context.py", line 384, in sql
  File "/usr/hdp/2.6.4.0-91/spark2/python/lib/pyspark.zip/pyspark/sql/session.py", line 603, in sql
  File "/usr/hdp/2.6.4.0-91/spark2/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/hdp/2.6.4.0-91/spark2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
pyspark.sql.utils.ParseException: u"\nno viable alternative at input '(case when app_new in 'app_3''(line 1, pos 39)\n\n== SQL ==\nselect click_id, (case when app_new in 'app_3' then 1 else 0 END) as  'app_3', device, os, channel from clickDF \n---------------------------------------^^^\n"

表名称、日期ID、日期ID、事件时间戳、事件时间戳都是python值

我认为您分配动态变量的方式是错误的。谢谢您的回复。我已经按照建议做了修改。我可以看到这些值被填充了,但最终结果是“输入时没有可行的替代方案”。下面的错误详细信息:pyspark.sql.utils.ParseException:u“\n在输入时没有可行的替代方案”(当app_在app_3中新建时的情况)(第1行,第39位)\n\n==sql==\n选择click_id,(当app_在app_3中新建时的情况,然后是1 else 0 END)作为app_3,设备,操作系统,来自clickDF\n的频道--------------------------------------------------------------------^^^\n”。请告诉我。嗨,Vijay,我已经在问题中添加了完整的代码,并带有子标题。请检查并帮助我解决Hi-Vijay,我能够动态传递值,但查询抛出了一个错误,详细如下:pyspark.sql.utils.ParseException:u“\n输入时没有可行的替代方案”(当“app_3”中的app_新建时)(第1行,第39位)\n\n==SQL==\n从clickDF中选择click id(当“app_3”中的app_新建时,则为1,否则为0结束)作为app_3、设备、操作系统、频道。您知道原因吗?谢谢。错误是因为缺少括号()。现在已解决。
app_sql = "".join(["select a.app, CONCAT('app', '_',a.app) as new_app, a.app_count from(select app, count(app) as app_count from dblclk_text.click_data group by app )a order by a.app_count desc limit 5"])
df1 = hiveContext.sql(app_sql)
df1.createOrReplaceTempView('app')
df1_new_app = sqlContext.sql("select new_app from app ")
df1_new_app.printSchema()

app_list = []
app_result = df1_new_app.collect()

print(type(app_result))

for app in app_result:
    app_list.append(app)

click_sql = "select click_id,CONCAT ('app', '_', app) as app_new, device, os, channel from dblclk_text.click_pank_data"
clickDF = hiveContext.sql(click_sql)

clickDF.createOrReplaceTempView('clickDF')

app_list_0 = str(app_list[0])
#app_list_0 = 'app_3'

print (app_list_0)

sample_sql = '''select click_id, (case when app_new in {0} then 1 else 0 END) as  {0}, device, os, channel from clickDF '''.format(app_list_0)
click_app_join_sql = sqlContext.sql(sample_sql)
click_app = sqlContext.sql(click_app_join_sql)
click_app.show(3)
sample_sql = '''
select tenant, user_id, event_type,bounce_class from {0}
where {1} and  {2} and  {3} and  {4}
and (event_type in ('list_unsubscribe', 'link_unsubscribe', 'spam_complaint') or bounce_class in ('10','30','90'))
and tenant is not null
and tenant != '' 
'''.format(table_name, dateid_start, dateid_end,event_timestamp_start,event_timestamp_end)

HiveContext.sql(sample_sql)