我想通过将concepts的值作为参数值传递给UDF has_any_concept来执行以下查询。
以下是环境中的内容
concepts['CREATININE_QUANTITATIVE_24_HOUR_DIALYSIS_FLUID_OBSTYPE',
'CREATININE_QUANTITATIVE_24_HOUR_URINE_OBSTYPE',
'CREATININE_QUANTITATIVE_SERUM_OBSTYPE']这是不传递参数的查询。
(spark.sql("""
select
resultCode.standard.primaryDisplay as display
from results
WHERE has_any_concept(resultCode, array("CREATININE_QUANTITATIVE_24_HOUR_DIALYSIS_FLUID_OBSTYPE","CREATININE_QUANTITATIVE_24_HOUR_URINE_OBSTYPE","CREATININE_QUANTITATIVE_SERUM_OBSTYPE"))
LIMIT 3
""".format(concepts = concepts))\
.toPandas()
)display
0 Creatinine [Mass/volume] in Serum or Plasma
1 Creatinine [Mass/volume] in Serum or Plasma
2 Creatinine [Mass/volume] in Serum or Plasma这也是可行的
(spark.sql("""
select
resultCode.standard.primaryDisplay as display,
ontologicalCategoryAliases as category
from results
WHERE has_any_concept(resultCode, array("{concepts[0]}","{concepts[1]}","{concepts[2]}"))
LIMIT 3
""".format(concepts = concepts))\
.toPandas()
)display category
0 Creatinine [Mass/volume] in Serum or Plasma [LABS_OBSTYPE]
1 Creatinine [Mass/volume] in Serum or Plasma [LABS_OBSTYPE]
2 Creatinine [Mass/volume] in Serum or Plasma [LABS_OBSTYPE]这不起作用
(spark.sql("""
select
resultCode.standard.primaryDisplay as display,
ontologicalCategoryAliases as category
from results
WHERE has_any_concept(resultCode, array({concepts}))
LIMIT 3
""".format(concepts = [''' "{concept}" '''.format(concept = concept) for concept in concepts]))\
.toPandas()
)ParseException: '\nmismatched input \'from\' expecting <EOF>(line 7, pos 3)\n\n== SQL ==\n\nselect \n \n resultCode.standard.primaryDisplay as display,\n ontologicalCategoryAliases as category\n \n from results \n---^^^\n WHERE has_any_concept(resultCode, array([\' "CREATININE_QUANTITATIVE_24_HOUR_DIALYSIS_FLUID_OBSTYPE" \', \' "CREATININE_QUANTITATIVE_24_HOUR_URINE_OBSTYPE" \', \' "CREATININE_QUANTITATIVE_SERUM_OBSTYPE" \']))\n AND normalizedValue.typedValue.type = "NUMERIC" \n AND interpretation.standard.primaryDisplay NOT IN (\'Not applicable\', \'Normal\')\n \n LIMIT 10\n'我没有写UDF has_any_concepts。
发布于 2021-03-09 13:06:53
如果您使用的是python 3.6+,那么如果您使用f-strings,代码看起来会更整洁一些。
在SQL语法中,不能直接将列表传递给数组函数。
spark.sql(
f"""
select
resultCode.standard.primaryDisplay as display,
ontologicalCategoryAliases as category
from results
WHERE has_any_concept(resultCode, array({", ".join([f"'{x}'" for x in concepts])}))
LIMIT 3
"""
).toPandas()https://stackoverflow.com/questions/66536908
复制相似问题