Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/spring/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Dataframe Pyspark-使用startswith from列表创建一个新列_Dataframe_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Dataframe Pyspark-使用startswith from列表创建一个新列

Dataframe Pyspark-使用startswith from列表创建一个新列,dataframe,apache-spark,pyspark,apache-spark-sql,Dataframe,Apache Spark,Pyspark,Apache Spark Sql,根据字符串检查条件添加新列的最佳方法是什么 如果使用某些已定义的值启动,则必须使用现有列值创建新列: |deliveryname|department|state|salary| +-------------+----------+-----+------+ | LA| Sales| NY| 90000| | Austin| Sales| NY| 86000| | Robert| Sales| CA| 81000| |

根据字符串检查条件添加新列的最佳方法是什么

如果使用某些已定义的值启动,则必须使用现有列值创建新列:

|deliveryname|department|state|salary|
+-------------+----------+-----+------+
|          LA|     Sales|   NY| 90000|
|      Austin|     Sales|   NY| 86000|
|      Robert|     Sales|   CA| 81000|
|     Snooze |   Finance|   CA| 90000|
|     MidWest|   Finance|   NY| 83000|
|        Jeff| Marketing|   CA| 80000|

df= df.withColumn("DeliveryPossible",when(df.deliveryname.startswith(s) for s in (('LO - ','Austin','MidWest','San Antonios', 'Snooze ea')),'True').otherwise('False'))

所需输出:

|deliveryname|department|state|salary|DeliveryPossible
+-------------+----------+-----+------+
|          LA|     Sales|   NY| 90000|False
|      Austin|     Sales|   NY| 86000|True
|      Robert|     Sales|   CA| 81000|False
|     Snooze |   Finance|   CA| 90000|True
|     MidWest|   Finance|   NY| 83000|True
|        Jeff| Marketing|   CA| 80000|False

我在这两个方面都遇到了相同的错误,我想我缺少了括号,但无法找出放在哪里。这也是正确的做法吗

如果不是唯一参数,则生成器表达式必须加括号

谢谢。df.startswith()只接受一个字符串作为参数。您需要单独设置条件,并使用“或”组合它们

from functools import reduce
from operator import or_

values = ['LO - ','Austin','MidWest','San Antonios', 'Snooze ea']

df.withColumn("DeliveryPossible",
              reduce(or_, [df.company_name.startswith(s) for s in values])
             ).show()
df.startswith()
只接受一个字符串作为其参数。您需要单独设置条件,并使用“或”组合它们

from functools import reduce
from operator import or_

values = ['LO - ','Austin','MidWest','San Antonios', 'Snooze ea']

df.withColumn("DeliveryPossible",
              reduce(or_, [df.company_name.startswith(s) for s in values])
             ).show()