Apache spark 带列函数的数据帧_Apache Spark_Dataframe_Pyspark

Apache spark 带列函数的数据帧

apache-spark dataframe pyspark

Apache spark 带列函数的数据帧,apache-spark,dataframe,pyspark,Apache Spark,Dataframe,Pyspark,我知道如何将withColumn函数与数据帧一起使用的一般结构，如 df = df.withColumn("new_column_name", ((df.someColumn > someValue) & (df.someColumn < someOtherValue))) df=df.withColumn（“新的列名”（（df.someColumn>someValue）和（df.someColumn和”，“=”）））；df=df.withColumn（“query\u t

我知道如何将withColumn函数与数据帧一起使用的一般结构，如

df = df.withColumn("new_column_name", ((df.someColumn > someValue) & (df.someColumn < someOtherValue)))

df=df.withColumn（“新的列名”（（df.someColumn>someValue）和（df.someColumn


现在，假设操作员信息（>和<在上述示例中）存储为字符串（由用户输入）。如何执行上述操作？我能想到的一种天真的方法是为每种操作编写多个if-else块，每当我们想添加新操作时，就必须添加更多if-else块
我在这里遗漏了什么明显的调整
提前谢谢
 我不是专家，但我认为这是不可能的。如果没有操作符，Scala DSL将无法编译和理解。一种方法是，正如您所说，定义一个具有多种情况的匹配函数。相关帖子：在[“”，“=”]中断言op1，在[“”，“=”]中断言op2；select（pyspark.sql.functions.expr（“someColumn{}someValue和someColumn{}someOtherValue.format（op1，op2））
？我继续测试了它。这对于“选择”部件非常有效。但我如何在withColumn中使用它呢。我尝试过类似于new_df=df.select（expr（“metric.duration{}0.5，is_activity{}False as query.format（“>”，“=”）））；df=df.withColumn（“query\u two”，new\u df[“query”]）
。但是，它在withColumn（self，colName，col）1312““1313 assert isinstance（col，Column），“col应该是Column”->1314返回数据帧（self.\u jdf.withColumn（colName，col.\u jc），self.sql\u ctx）