Python PySpark:用数值替换字符串

Python PySpark:用数值替换字符串,python,sql,apache-spark,pyspark,Python,Sql,Apache Spark,Pyspark,我有下表: +----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+ | _created| _updated| name| description| indication| name|

我有下表:

 +----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+
    |  _created|  _updated|                name|         description|          indication|                name|      patents_patent|
    +----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+
    |2005-06-13|2016-08-17|           Lepirudin|Lepirudin is iden...|For the treatment...|           Lepirudin|{"data" : [{"coun...|
    |2005-06-13|2017-04-27|           Cetuximab|Cetuximab is an e...|Cetuximab, used i...|           Cetuximab|{"data" : [{"coun...|
    |2005-06-13|2017-06-14|        Dornase alfa|Dornase alfa is a...|Used as adjunct t...|        Dornase alfa|{"data" : [{"coun...|
    |2005-06-13|2016-08-17| Denileukin diftitox|A recombinant DNA...|For treatment of ...| Denileukin diftitox|                NULL|
    |2005-06-13|2017-03-10|          Etanercept|Dimeric fusion pr...|Etanercept is ind...|          Etanercept|{"data" : [{"coun...|
    |2005-06-13|2017-07-06|         Bivalirudin|Bivalirudin is a ...|For treatment of ...|         Bivalirudin|{"data" : [{"coun...|
    |2005-06-13|2017-07-05|          Leuprolide|Leuprolide belong...|For treatment of ...|          Leuprolide|{"data" : [{"coun...|
    |2005-06-13|2017-06-16|Peginterferon alf...|Peginterferon alf...|Peginterferon alf...|Peginterferon alf...|{"data" : [{"coun...|
    |2005-06-13|2017-06-08|           Alteplase|Human tissue plas...|For management of...|           Alteplase|                NULL|
    |2005-06-13|2016-12-08|          Sermorelin|Sermorelin acetat...|For the treatment...|          Sermorelin|                NULL|
    |2005-06-13|2016-08-17|  Interferon alfa-n1|Purified, natural...|For treatment of ...|  Interferon alfa-n1|                NULL|
理想情况下,我需要导出两个表:

表1我将筛选出专利不为空的表,并用1替换专利中的字符串:

+----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+
    |  _created|  _updated|                name|         description|          indication|                name|      patents_patent|
    +----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+
    |2005-06-13|2016-08-17|           Lepirudin|Lepirudin is iden...|For the treatment...|           Lepirudin|1|
    |2005-06-13|2017-04-27|           Cetuximab|Cetuximab is an e...|Cetuximab, used i...|           Cetuximab|1|
    |2005-06-13|2017-06-14|        Dornase alfa|Dornase alfa is a...|Used as adjunct t...|        Dornase alfa|1|
    |2005-06-13|2017-03-10|          Etanercept|Dimeric fusion pr...|Etanercept is ind...|          Etanercept|1|
    |2005-06-13|2017-07-06|         Bivalirudin|Bivalirudin is a ...|For treatment of ...|         Bivalirudin|1|
    |2005-06-13|2017-07-05|          Leuprolide|Leuprolide belong...|For treatment of ...|          Leuprolide|1|
    |2005-06-13|2017-06-16|Peginterferon alf...|Peginterferon alf...|Peginterferon alf...|Peginterferon alf...|1|
    |
table_two=筛选出专利为null的表,并将null替换为0

    +----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+
   |  _created|  _updated|                name|         description|          indication|                name|      patents_patent|
    +----------+----------+--------------------+--------------------+--------------------+--------------------+------------------
   |2005-06-13|2016-08-17| Denileukin diftitox|A recombinant DNA...|For treatment of ...| Denileukin diftitox|                0|

    |2005-06-13|2017-06-08|           Alteplase|Human tissue plas...|For management of...|           Alteplase|                0|
    |2005-06-13|2016-12-08|          Sermorelin|Sermorelin acetat...|For the treatment...|          Sermorelin|                0|
    |2005-06-13|2016-08-17|  Interferon alfa-n1|Purified, natural...|For treatment of ...|  Interferon alfa-n1|                0|
我试过这个:

我试过这个:

from pyspark.sql.functions import col, expr, when

data = table.where(col("patents_patent").isNull())

data = table.filter("patents_patent is not NULL")
结果错误或为空:

root
 |-- _created: string (nullable = true)
 |-- _updated: string (nullable = true)
 |-- name: string (nullable = true)
 |-- description: string (nullable = true)
 |-- indication: string (nullable = true)
 |-- patents_patent: string (nullable = true)
谢谢你的帮助

表1我将筛选出专利不为空的表,并用1替换专利中的字符串:

+----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+
    |  _created|  _updated|                name|         description|          indication|                name|      patents_patent|
    +----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+
    |2005-06-13|2016-08-17|           Lepirudin|Lepirudin is iden...|For the treatment...|           Lepirudin|1|
    |2005-06-13|2017-04-27|           Cetuximab|Cetuximab is an e...|Cetuximab, used i...|           Cetuximab|1|
    |2005-06-13|2017-06-14|        Dornase alfa|Dornase alfa is a...|Used as adjunct t...|        Dornase alfa|1|
    |2005-06-13|2017-03-10|          Etanercept|Dimeric fusion pr...|Etanercept is ind...|          Etanercept|1|
    |2005-06-13|2017-07-06|         Bivalirudin|Bivalirudin is a ...|For treatment of ...|         Bivalirudin|1|
    |2005-06-13|2017-07-05|          Leuprolide|Leuprolide belong...|For treatment of ...|          Leuprolide|1|
    |2005-06-13|2017-06-16|Peginterferon alf...|Peginterferon alf...|Peginterferon alf...|Peginterferon alf...|1|
    |
对于第一种情况,你应该这样做

data_not_null = table.filter((table['patents_patent'] != "NULL")).withColumn('patents_patent', f.lit("1"))
table_two=筛选出专利为null的表,并将null替换为0

    +----------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+
   |  _created|  _updated|                name|         description|          indication|                name|      patents_patent|
    +----------+----------+--------------------+--------------------+--------------------+--------------------+------------------
   |2005-06-13|2016-08-17| Denileukin diftitox|A recombinant DNA...|For treatment of ...| Denileukin diftitox|                0|

    |2005-06-13|2017-06-08|           Alteplase|Human tissue plas...|For management of...|           Alteplase|                0|
    |2005-06-13|2016-12-08|          Sermorelin|Sermorelin acetat...|For the treatment...|          Sermorelin|                0|
    |2005-06-13|2016-08-17|  Interferon alfa-n1|Purified, natural...|For treatment of ...|  Interferon alfa-n1|                0|
对于第二种情况,您应该执行以下操作

data_null = table.where(f.col("patents_patent").isNull() | (table['patents_patent'] == "NULL")).withColumn('patents_patent', f.lit("0"))
对于这些,我将作为

from pyspark.sql import functions as f

当然
f.col(“专利”
表['patents\u patent']
的意思是相同的

你能说明你尝试了什么以及问题出在哪里吗?我更新了问题
专利的类型是什么
你能共享模式吗?值实际上是
空的
还是字符串
“NULL”
如示例所示?如示例所示,th字符串为NULL