Python PySpark为列赋值';带列';

Python PySpark为列赋值';带列';,python,pyspark,pyspark-sql,pyspark-dataframes,Python,Pyspark,Pyspark Sql,Pyspark Dataframes,我是PySaprk的新手,但对R有一些经验 问题:我想为一列中列出的高度(数字)指定一个名称。我开始编写代码如下: w = Window.partitionBy("student_id") df_enc_hw = df_enc_hw.withColumn("stuname", \ when(lower(col("height")) <= 4, "under_ht") .when(lower(col(

我是PySaprk的新手,但对R有一些经验

问题:我想为一列中列出的高度(数字)指定一个名称。我开始编写代码如下:

w = Window.partitionBy("student_id")
df_enc_hw = df_enc_hw.withColumn("stuname", \
                       when(lower(col("height")) <= 4, "under_ht") 
                      .when(lower(col("height")) > 4 < 5, "ok_ht")  
                      .when(lower(col("height")) >=5 < 6, "normal_ht")  
                      .when(lower(col("height")) >=6, "abnor_ht")) 
谢谢你的帮助
K

您应该将条件句拆分为单独的表达式,如下所示:

df_enc_hw = df_enc_hw.withColumn("stuname", \
                       when(lower(col("height")) <= 4, "under_ht") 
                      .when((lower(col("height")) > 4) & (lower(col("height")) < 5), "ok_ht")  
                      .when((lower(col("height")) >=5) & (lower(col("height")) < 6), "normal_ht")  
                      .when(lower(col("height")) >=6, "abnor_ht"))
df_enc_hw=df_enc_hw.withColumn(“stuname”\
当(较低(柱(“高度”)4)和(较低(柱(“高度”)小于5)时,“正常”)
.当((较低(柱(“高度”)大于等于5)和(较低(柱(“高度”)小于6)时,“正常”)
.当(较低(柱(“高度”)大于等于6时,“异常”)

lower(col(“height”)>4<5
更改为
(lower(col(“height”)>4)和(lower(col(“height”))<5)
(其他条件相同)。这是运算符优先级的问题。非常感谢您的时间和帮助。它正在工作:)
df_enc_hw = df_enc_hw.withColumn("stuname", \
                       when(lower(col("height")) <= 4, "under_ht") 
                      .when((lower(col("height")) > 4) & (lower(col("height")) < 5), "ok_ht")  
                      .when((lower(col("height")) >=5) & (lower(col("height")) < 6), "normal_ht")  
                      .when(lower(col("height")) >=6, "abnor_ht"))