Python 从一列中添加或派生两列
给定以下数据帧: - 如何在上述数据框中再添加两列 大宗报价 例如,第1列有“是”答案的计数,第2列有“否”答案的计数Python 从一列中添加或派生两列,python,sql,dataframe,pyspark,Python,Sql,Dataframe,Pyspark,给定以下数据帧: - 如何在上述数据框中再添加两列 大宗报价 例如,第1列有“是”答案的计数,第2列有“否”答案的计数 -+------------------+--------------------++--------------------+ |Customer_ID|Project_ID|QUESTION_TYP|ANSWER|col_1|col_2 +---------+----------+-------------+---------+------+-------------
-+------------------+--------------------++--------------------+
|Customer_ID|Project_ID|QUESTION_TYP|ANSWER|col_1|col_2
+---------+----------+-------------+---------+------+-------------
1 1 2nd QUES YES. 2 0
1 2 2nd QUES NO 0. 1
1 2 2nd Ques. Yes 1. 0
请帮忙。我已经尝试了大多数解决方案,但得到的结果是肯定或否定,但我希望按行。请帮助这看起来像窗口函数。假设您希望通过
客户id
进行计数:
select t.*,
sum(case when answer = 'YES' then 1 else 0 end) over (partition by customer_id) as col_1,
sum(case when answer = 'NO' then 1 else 0 end) over (partition by customer_id) as col_2
from t;
或者,如果您想对所有数据进行计数,只需将(按客户id划分)
替换为()
注意:这会将计数放在两列中。如果只希望列中的计数与行中的答案匹配:
select t.*,
(case when answer = 'YES'
then sum(case when answer = 'YES' then 1 else 0 end) over (partition by customer_id)
else 0
end) as col_1,
(case when answer = 'NO'
then sum(case when answer = 'NO' then 1 else 0 end) over (partition by customer_id)
else 0
end) as col_2
from t;
然而,我认为这两项都有意义。您尝试过什么?
select t.*,
(case when answer = 'YES'
then sum(case when answer = 'YES' then 1 else 0 end) over (partition by customer_id)
else 0
end) as col_1,
(case when answer = 'NO'
then sum(case when answer = 'NO' then 1 else 0 end) over (partition by customer_id)
else 0
end) as col_2
from t;