Apache spark 在一列中获取spark dataframe的所有非空列

Apache spark 在一列中获取spark dataframe的所有非空列,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我需要从配置单元表中选择所有NOTNULLS列,并将它们插入Hbase。例如,考虑下表: Name Place Department Experience ============================================== Ram | Ramgarh | Sales | 14 Lakshman | Lakshmanpur |Operations | Sita | Sitapur |

我需要从配置单元表中选择所有NOTNULLS列,并将它们插入Hbase。例如,考虑下表:

Name      Place         Department  Experience
==============================================
Ram      | Ramgarh      |  Sales      |  14
Lakshman | Lakshmanpur  |Operations   | 
Sita     | Sitapur      |             |  14
Ravan    |              |             |  25
我必须将上表中的所有非空列写入Hbase。因此,我编写了一个逻辑,以在dataframe的一列中获取NOTNULL列,如下所示。“名称”列是必需的

Name        Place       Department  Experience      Not_null_columns
================================================================================
Ram         Ramgarh     Sales        14            Name, Place, Department, Experience
Lakshman    Lakshmanpur Operations                 Name, Place, Department
Sita        Sitapur                  14            Name, Place, Experience
Ravan                                25            Name, Experience
现在,我的要求是在dataframe中创建一个列,在单个列中包含所有非空列的值,如下所示

Name      Place        Department   Experience    Not_null_columns_values
Ram       Ramgarh      Sales        14           Name: Ram, Place: Ramgarh, Department: Sales, Experince: 14
Lakshman  Lakshmanpur  Operations                Name:    Lakshman, Place: Lakshmanpur, Department: Operations
Sita      Sitapur                   14           Name:    Sita, Place: Sitapur, Experience: 14
Ravan                               25           Name:    Ravan, Experience: 25
一旦我超过df,我将把它写到Hbase中,名称作为键,最后一列作为值

请让我知道是否有更好的方法来做到这一点

试试这个-

加载提供的测试数据 val数据= |姓名|地点|部门|经验 | |拉姆|拉姆加尔|销售| 14 | |Lakshman | Lakshmanpur |运营| | |锡塔|锡塔普尔| 14 | |拉万| | | 25 .条纹边缘 val stringDS=data.splitSystem.lineSeparator .map|.split\\|.map|.replaceAll^[\t]+|[\t]+$,.mkString, .toSeq.toDS val df=spark.read .选项SEP, .optioninferSchema,对 .optionheader,true //.optionnullValue,null .csvstringDS df.showfalse 打印模式 /** * +----+------+-----+-----+ *|姓名|地点|部门|经验| * +----+------+-----+-----+ *|拉姆|拉姆加尔|销售| 14| *|拉克希曼|拉克希曼普尔|运营|无效| *| Sita | Sitapur | null | 14| *| Ravan |空|空| 25| * +----+------+-----+-----+ * *根 *|-Name:string nullable=true *|-Place:string nullable=true *|-Department:string nullable=true *|-经验:整数可为空=真 */ 先转换结构,然后转换json val x=df.withColumnNot_null_columns_值, 要_jsonstructdf.columns.mapcol:_* x、 秀假 x、 打印模式 /** * +----+------+-----+-----+-----------------------------------+ *|名称|地点|部门|经验|非|空|列|值| * +----+------+-----+-----+-----------------------------------+ *| Ram | Ramgarh |销售| 14 |{姓名:Ram,地点:Ramgarh,部门:销售,经验:14}| *| Lakshman | Lakshmanpur | Operations | null |{姓名:Lakshman,地点:Lakshmanpur,部门:Operations}| *| Sita | Sitapur | null | 14 |{姓名:Sita,地点:Sitapur,经历:14}| *| Ravan | null | null | 25 |{姓名:Ravan,经历:25}| * +----+------+-----+-----+-----------------------------------+ */
谢谢你,这很有魅力。还在努力理解逻辑。作为有火花的新手。