Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/319.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/android/219.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在pyspark中组合Where和Withcolumn_Python_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Python 如何在pyspark中组合Where和Withcolumn

Python 如何在pyspark中组合Where和Withcolumn,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,我在pyspark中有一个数据帧,如下所示: +--------------------+--------+----------+--------------------+--------------------+ | title| journal| date| author| content| +--------------------+--------+----------+---------------

我在pyspark中有一个数据帧,如下所示:

+--------------------+--------+----------+--------------------+--------------------+
|               title| journal|      date|              author|             content|
+--------------------+--------+----------+--------------------+--------------------+
|Kudlow Breaks Wit...|NYT     |2019-05-01|    By Mark Landler |WASHINGTON — Pres...|
|Scrutiny of Russi...|NYT     |2019-05-01|By Charlie Savage...|WASHINGTON — The ...|
|Greek Anarchists ...|NYP     |2019-05-01|By Niki Kitsantonis |ATHENS — Greek an...|
我正在寻找日记账等于“NYP”的替换行。我知道如何处理sql上下文:

df.createOrReplaceTempView("tbl_journal")
df = sqlContext.sql("SELECT journal, date FROM tbl_journal where journal like '%NYT%'")
df = df.withColumn('journal', lit('The New York Times'))
但问题是它将在原始数据帧上重写(我只想替换journal='NYT'中的值,并保留其他值)

另外,我搜索其他主题,但并没有找到将Where和WithColumn语句组合在一起的解决方案。我的意思是,如果我在PySpark中这样做(而不是使用SQL):

它替换了所有的值,没有条件


您知道如何在原始数据帧中仅用此条件替换值吗?使用spark或sqlcontext。谢谢你的进步

如果没有,则使用
,有条件地填充值-

from pyspark.sql.functions import when
df = df.withColumn('journal', when(df.journal.like('%NYT%'), 'The New York Times').otherwise(df.journal))

非常感谢你的帮助。我在“like”语句中有一个无效语法错误。我使用pyspark导入所有sql函数。我不知道是否有一些东西需要通过pyspark导入才能使用此语句?请尝试编辑后的回答OK我还有一个错误,它是“TypeError:条件应为列”。我想这是关于“像“%NYT%”这样的日记不?试试更新的,它对我很有用。非常感谢它工作得很好!我不知道When-other语句,但它工作得很好。祝您愉快:)
from pyspark.sql.functions import when
df = df.withColumn('journal', when(df.journal.like('%NYT%'), 'The New York Times').otherwise(df.journal))