Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
String pySpark字符串提取_String_Apache Spark_Pyspark_Extract - Fatal编程技术网

String pySpark字符串提取

String pySpark字符串提取,string,apache-spark,pyspark,extract,String,Apache Spark,Pyspark,Extract,我在spark df中有一列目标。这些值如下所示: ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking ab=px_d_1200;ab=8;ab=t_d_o_1000;apn=640x480_370;artid=delish.recipe.25860457;artid=delish_recipe_2586

我在spark df中有一列
目标
。这些值如下所示:

ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking
ab=px_d_1200;ab=8;ab=t_d_o_1000;apn=640x480_370;artid=delish.recipe.25860457;artid=delish_recipe_25860457;avb=90;cat=recipes;clc=chicken-breast-recipes;clc=insanely-easy-chicken-dinners;clc=weeknight-dinners;embedid=a6311e94-3b66-4712-8fca-eaa423e4e69a;gs_cat=response_check;gs_cat=gl_english;role=3;sect=cooking;sub=recipe-ideas;tool=recipe;urlhash=5425cac3a9c2959917d0634f5bd6d842
我需要提取role=X。此外,等号后面的值需要保存在另一列中。 所需输出为:

role
3
3

这可能是一个有效的解决方案

在此处创建数据框

df = spark.createDataFrame([(1,"ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking")],[ "col1","col2"])
df.show(truncate=False)
+----+--------------------------------------------------------------------------------------------------------------------------+
|col1|col2                                                                                                                      |
+----+--------------------------------------------------------------------------------------------------------------------------+
|1   |ab=px_d_1200;ab=9;ab=t_d_o_1000;artid=delish.recipe.46338;artid=delish_recipe_46338;avb=85;cat=recipes;role=3;sect=cooking|
+----+--------------------------------------------------------------------------------------------------------------------------+

df_new = df.filter(F.col("col2").contains("role"))
df_new = df_new.withColumn("split_col", F.explode(F.split(F.col("col2"), ";")))
df_new = df_new.filter(F.col("split_col").contains("role"))
df_new = df_new.withColumn("final_col", (F.split(F.col("split_col"), "=")))
df_new = df_new.withColumn("role", F.element_at(F.col('final_col'), -1).alias('role'))
df_new.show()

+----+--------------------+---------+---------+----+
|col1|                col2|split_col|final_col|role|
+----+--------------------+---------+---------+----+
|   1|ab=px_d_1200;ab=9...|   role=3|[role, 3]|   3|
+----+--------------------+---------+---------+----+

完美的这正是我需要的。非常感谢。很好,它帮助了你……)如果你也能投票,我将不胜感激