Pyspark-SQL:使用case-when语句
我有一个像这样的数据框Pyspark-SQL:使用case-when语句,sql,apache-spark,pyspark,apache-spark-sql,pyspark-sql,Sql,Apache Spark,Pyspark,Apache Spark Sql,Pyspark Sql,我有一个像这样的数据框 >>> df_w_cluster.select('high_income', 'aml_cluster_id').show(10) +-----------+--------------+ |high_income|aml_cluster_id| +-----------+--------------+ | 0| 0| | 0| 0| | 0|
>>> df_w_cluster.select('high_income', 'aml_cluster_id').show(10)
+-----------+--------------+
|high_income|aml_cluster_id|
+-----------+--------------+
| 0| 0|
| 0| 0|
| 0| 1|
| 0| 1|
| 0| 0|
| 0| 0|
| 0| 1|
| 1| 1|
| 1| 0|
| 1| 0|
+-----------+--------------+
only showing top 10 rows
high\u income
列是一个二进制列,可保存0
或1
。aml\u集群id
保存从0
到3
的值。我想创建一个新列,其值取决于该特定行中高收入
和反洗钱集群id
的值。我正在尝试使用SQL实现这一点
df_w_cluster.createTempView('event_rate_holder')
为了实现这一点,我编写了这样一个查询-
q = """select * , case
when "aml_cluster_id" = 0 and "high_income" = 1 then "high_income_encoded" = 0.162 else
when "aml_cluster_id" = 0 and "high_income" = 0 then "high_income_encoded" = 0.337 else
when "aml_cluster_id" = 1 and "high_income" = 1 then "high_income_encoded" = 0.049 else
when "aml_cluster_id" = 1 and "high_income" = 0 then "high_income_encoded" = 0.402 else
when "aml_cluster_id" = 2 and "high_income" = 1 then "high_income_encoded" = 0.005 else
when "aml_cluster_id" = 2 and "high_income" = 0 then "high_income_encoded" = 0.0 else
when "aml_cluster_id" = 3 and "high_income" = 1 then "high_income_encoded" = 0.023 else
when "aml_cluster_id" = 3 and "high_income" = 0 then "high_income_encoded" = 0.022 else
from event_rate_holder"""
当我使用spark运行它时
spark.sql(q)
我得到以下错误
mismatched input 'aml_cluster_id' expecting <EOF>(line 1, pos 22)
但我还是会出错
== SQL ==
select * , case
when aml_cluster_id = 0 and high_income = 1 then high_income_encoded = 0.162 else
-----^^^
接
pyspark.sql.utils.ParseException: "\nmismatched input 'aml_cluster_id' expecting <EOF>(line 2, pos 5)\n\n== SQL ==\nselect * ,
pyspark.sql.utils.ParseException:“\n应为不匹配的输入'aml\u cluster\u id'(第2行,位置5)\n\n==sql===\n选择*,
您使用的案例
变体的正确语法为
CASE
WHEN e1 THEN e2 [ ...n ]
[ ELSE else_result_expression ]
END
所以
- 然后应该跟在表达式后面。这里没有放置
的位置name=something
允许在每个ELSE
案例中使用一次,而不是在每次
之后使用
- 您的原始代码丢失结束
END
- 最后,不应引用列
案例
当反洗钱集群id=0且高收入=1时,则为0.162
当反洗钱集群id=0且高收入=0时,则为0.337
...
以高收入结束
您使用的CASE的正确语法为
CASE
WHEN e1 THEN e2 [ ...n ]
[ ELSE else_result_expression ]
END
所以
- 然后应该跟在表达式后面。这里没有放置
name=something
的位置
ELSE
允许在每个案例中使用一次,而不是在每次之后使用
- 您的原始代码丢失结束
END
- 最后,不应引用列
你可能是说
案例
当反洗钱集群id=0且高收入=1时,则为0.162
当反洗钱集群id=0且高收入=0时,则为0.337
...
以高收入结束
查询中的每个when条件都需要case end。并且需要在列名()后面打勾,
high_income_encoded`列名应在末尾加上别名。因此正确的查询如下所示
q = """select * ,
case when `aml_cluster_id` = 0 and `high_income` = 1 then 0.162 else
case when `aml_cluster_id` = 0 and `high_income` = 0 then 0.337 else
case when `aml_cluster_id` = 1 and `high_income` = 1 then 0.049 else
case when `aml_cluster_id` = 1 and `high_income` = 0 then 0.402 else
case when `aml_cluster_id` = 2 and `high_income` = 1 then 0.005 else
case when `aml_cluster_id` = 2 and `high_income` = 0 then 0.0 else
case when `aml_cluster_id` = 3 and `high_income` = 1 then 0.023 else
case when `aml_cluster_id` = 3 and `high_income` = 0 then 0.022
end
end
end
end
end
end
end
end as `high_income_encoded`
from event_rate_holder"""
查询中的每个when条件都需要case end。并且需要在列名()和
high\u income\u encoded`列名的末尾加上别名。因此正确的查询如下
q = """select * ,
case when `aml_cluster_id` = 0 and `high_income` = 1 then 0.162 else
case when `aml_cluster_id` = 0 and `high_income` = 0 then 0.337 else
case when `aml_cluster_id` = 1 and `high_income` = 1 then 0.049 else
case when `aml_cluster_id` = 1 and `high_income` = 0 then 0.402 else
case when `aml_cluster_id` = 2 and `high_income` = 1 then 0.005 else
case when `aml_cluster_id` = 2 and `high_income` = 0 then 0.0 else
case when `aml_cluster_id` = 3 and `high_income` = 1 then 0.023 else
case when `aml_cluster_id` = 3 and `high_income` = 0 then 0.022
end
end
end
end
end
end
end
end as `high_income_encoded`
from event_rate_holder"""
不相关的问题,您是如何导入数据的?查询字符串中是否有换行符?请在执行查询之前尝试执行q=q.replace(“\n”,”)
。不相关的问题,您是如何导入数据的?查询字符串中是否有换行符?请尝试执行q=q.replace(“\n”,”)
在执行查询之前。答案是否有帮助?我尝试了Ramesh ans给出的答案,效果非常好!!谢谢。答案是否有帮助?我尝试了Ramesh ans给出的答案,效果非常好!!谢谢。