Python 使用pyspark中的格式将数组传递到SQL查询中
我想通过将Python 使用pyspark中的格式将数组传递到SQL查询中,python,pyspark,apache-spark-sql,format,Python,Pyspark,Apache Spark Sql,Format,我想通过将concepts的值作为参数值传递给UDFhas\u any\u concepts来执行以下查询 环境中存在以下问题: concepts 这是不传递参数的查询 (spark.sql(""" select resultCode.standard.primaryDisplay as display from results WHERE h
concepts
的值作为参数值传递给UDFhas\u any\u concepts
来执行以下查询
环境中存在以下问题:
concepts
这是不传递参数的查询
(spark.sql("""
select
resultCode.standard.primaryDisplay as display
from results
WHERE has_any_concept(resultCode, array("CREATININE_QUANTITATIVE_24_HOUR_DIALYSIS_FLUID_OBSTYPE","CREATININE_QUANTITATIVE_24_HOUR_URINE_OBSTYPE","CREATININE_QUANTITATIVE_SERUM_OBSTYPE"))
LIMIT 3
""".format(concepts = concepts))\
.toPandas()
)
这同样有效
(spark.sql("""
select
resultCode.standard.primaryDisplay as display,
ontologicalCategoryAliases as category
from results
WHERE has_any_concept(resultCode, array("{concepts[0]}","{concepts[1]}","{concepts[2]}"))
LIMIT 3
""".format(concepts = concepts))\
.toPandas()
)
这行不通
(spark.sql("""
select
resultCode.standard.primaryDisplay as display,
ontologicalCategoryAliases as category
from results
WHERE has_any_concept(resultCode, array({concepts}))
LIMIT 3
""".format(concepts = [''' "{concept}" '''.format(concept = concept) for concept in concepts]))\
.toPandas()
)
ParseException:“\n匹配输入来自”预期(第7行,位置3)\n\n==SQL=\n\n选择\n\n resultCode.standard.primaryDisplay作为显示,\n本体分类作为类别\n\n来自结果\n----^^^ \n其中有任何概念(结果代码,数组([\'“肌酐定量”\u 24小时透析”\u液体”\u OBSTYPE“,\'“肌酐定量”\u 24小时尿”\u OBSTYPE“,\'“肌酐定量”\u血清”\u OBSTYPE“)\n和标准化值.typedValue.type=“NUMERIC”\n和解释.standard.primary显示不在(\'不适用','Normal\'))\n\n限制10\n'
我没有编写UDF
有任何概念
如果您使用的是python 3.6+,那么如果您使用它,代码看起来会更干净一些
在SQL语法中,不能直接将列表传递给数组函数
spark.sql(
“f”
挑选
resultCode.standard.primaryDisplay作为显示,
本体论范畴作为范畴
根据结果
其中有任何概念(resultCode,数组({“,”.join([f”{x}',表示概念中的x]))
限制3
"""
).toPandas()
谢谢,这很有效。
(spark.sql("""
select
resultCode.standard.primaryDisplay as display,
ontologicalCategoryAliases as category
from results
WHERE has_any_concept(resultCode, array("{concepts[0]}","{concepts[1]}","{concepts[2]}"))
LIMIT 3
""".format(concepts = concepts))\
.toPandas()
)
display category
0 Creatinine [Mass/volume] in Serum or Plasma [LABS_OBSTYPE]
1 Creatinine [Mass/volume] in Serum or Plasma [LABS_OBSTYPE]
2 Creatinine [Mass/volume] in Serum or Plasma [LABS_OBSTYPE]
(spark.sql("""
select
resultCode.standard.primaryDisplay as display,
ontologicalCategoryAliases as category
from results
WHERE has_any_concept(resultCode, array({concepts}))
LIMIT 3
""".format(concepts = [''' "{concept}" '''.format(concept = concept) for concept in concepts]))\
.toPandas()
)
ParseException: '\nmismatched input \'from\' expecting <EOF>(line 7, pos 3)\n\n== SQL ==\n\nselect \n \n resultCode.standard.primaryDisplay as display,\n ontologicalCategoryAliases as category\n \n from results \n---^^^\n WHERE has_any_concept(resultCode, array([\' "CREATININE_QUANTITATIVE_24_HOUR_DIALYSIS_FLUID_OBSTYPE" \', \' "CREATININE_QUANTITATIVE_24_HOUR_URINE_OBSTYPE" \', \' "CREATININE_QUANTITATIVE_SERUM_OBSTYPE" \']))\n AND normalizedValue.typedValue.type = "NUMERIC" \n AND interpretation.standard.primaryDisplay NOT IN (\'Not applicable\', \'Normal\')\n \n LIMIT 10\n'