Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/asp.net-core/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pyspark:如何用数组中的值替换每行的值_Pyspark_Apache Spark Sql - Fatal编程技术网

Pyspark:如何用数组中的值替换每行的值

Pyspark:如何用数组中的值替换每行的值,pyspark,apache-spark-sql,Pyspark,Apache Spark Sql,我将用monthList数组中的值更改date列中的数字 月表阵列 月列表=[无,一月、二月、三月、四月、五月、六月、七月、八月、九月、十月、十一月、十二月] Pypark代码 结果 我会试试这个,但它不起作用。错误“DataFrame”对象没有属性“apply” 感谢您的帮助。一种方法是使用monthlist预先创建dataframedate\u查找。此df可以广播以提高性能。然后您可以使用实际df执行左连接 from pyspark.sql import SparkSession from

我将用monthList数组中的值更改date列中的数字

月表阵列

月列表=[无,一月、二月、三月、四月、五月、六月、七月、八月、九月、十月、十一月、十二月]

Pypark代码

结果

我会试试这个,但它不起作用。错误“DataFrame”对象没有属性“apply”


感谢您的帮助。

一种方法是使用monthlist预先创建dataframedate\u查找。此df可以广播以提高性能。然后您可以使用实际df执行左连接

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

 spark = SparkSession.builder \
.appName('practice')\
.getOrCreate()

 sc= spark.sparkContext

monthList = ["None","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]

lookup_list = []

for i in range(len(monthList)):
   lookup_list.append((i,monthList[i]))


date_lookup = sc.parallelize(lookup_list).toDF(["date_num", "date_label"])

date_lookup.show()

 +--------+----------+
 |date_num|date_label|
 +--------+----------+
 |       0|      None|
 |       1|       Jan|
 |       2|       Feb|
 |       3|       Mar|
 |       4|       Apr|
 |       5|       May|
 |       6|       Jun|
 |       7|       Jul|
 |       8|       Aug|
 |       9|       Sep|
 |      10|       Oct|
 |      11|       Nov|
 |      12|       Dec|
 +--------+----------+

 df= sc.parallelize([
 (1,19.75), (2,15.51)]).toDF(["date", "value"])

 +----+-----+
 |date|value|
 +----+-----+
 |   1|19.75|
 |   2|15.51|
 +----+-----+

 df1  = df.join(F.broadcast(date_lookup),df.date==date_lookup.date_num, how='left').select('date_label','value')

 df1.show()

 +----------+-----+
 |date_label|value|
 +----------+-----+
 |       Jan|19.75|
 |       Feb|15.51|
 +----------+-----+

一种方法是使用monthlist预先创建dataframedate_查找。此df可以广播以提高性能。然后您可以使用实际df执行左连接

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

 spark = SparkSession.builder \
.appName('practice')\
.getOrCreate()

 sc= spark.sparkContext

monthList = ["None","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]

lookup_list = []

for i in range(len(monthList)):
   lookup_list.append((i,monthList[i]))


date_lookup = sc.parallelize(lookup_list).toDF(["date_num", "date_label"])

date_lookup.show()

 +--------+----------+
 |date_num|date_label|
 +--------+----------+
 |       0|      None|
 |       1|       Jan|
 |       2|       Feb|
 |       3|       Mar|
 |       4|       Apr|
 |       5|       May|
 |       6|       Jun|
 |       7|       Jul|
 |       8|       Aug|
 |       9|       Sep|
 |      10|       Oct|
 |      11|       Nov|
 |      12|       Dec|
 +--------+----------+

 df= sc.parallelize([
 (1,19.75), (2,15.51)]).toDF(["date", "value"])

 +----+-----+
 |date|value|
 +----+-----+
 |   1|19.75|
 |   2|15.51|
 +----+-----+

 df1  = df.join(F.broadcast(date_lookup),df.date==date_lookup.date_num, how='left').select('date_label','value')

 df1.show()

 +----------+-----+
 |date_label|value|
 +----------+-----+
 |       Jan|19.75|
 |       Feb|15.51|
 +----------+-----+

你可以试试这个选择-

df1.selectExpr 选择“无”、“一月”、“二月”、“三月”、“四月”、“五月”、“六月”、“七月”、“八月”、“九月”、“十月”、“十一月”、“十二月”,日期为日期, 价值 .showfalse /** * +--+---+ *|日期|值| * +--+---+ *|无| 19.75| *| 1月15日| 51| *| 2月20日| 66| * +--+---+ */
你可以试试这个选择-

df1.selectExpr 选择“无”、“一月”、“二月”、“三月”、“四月”、“五月”、“六月”、“七月”、“八月”、“九月”、“十月”、“十一月”、“十二月”,日期为日期, 价值 .showfalse /** * +--+---+ *|日期|值| * +--+---+ *|无| 19.75| *| 1月15日| 51| *| 2月20日| 66| * +--+---+ */
d.date = d.select('date').apply(lambda x: monthList[x]) 
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

 spark = SparkSession.builder \
.appName('practice')\
.getOrCreate()

 sc= spark.sparkContext

monthList = ["None","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]

lookup_list = []

for i in range(len(monthList)):
   lookup_list.append((i,monthList[i]))


date_lookup = sc.parallelize(lookup_list).toDF(["date_num", "date_label"])

date_lookup.show()

 +--------+----------+
 |date_num|date_label|
 +--------+----------+
 |       0|      None|
 |       1|       Jan|
 |       2|       Feb|
 |       3|       Mar|
 |       4|       Apr|
 |       5|       May|
 |       6|       Jun|
 |       7|       Jul|
 |       8|       Aug|
 |       9|       Sep|
 |      10|       Oct|
 |      11|       Nov|
 |      12|       Dec|
 +--------+----------+

 df= sc.parallelize([
 (1,19.75), (2,15.51)]).toDF(["date", "value"])

 +----+-----+
 |date|value|
 +----+-----+
 |   1|19.75|
 |   2|15.51|
 +----+-----+

 df1  = df.join(F.broadcast(date_lookup),df.date==date_lookup.date_num, how='left').select('date_label','value')

 df1.show()

 +----------+-----+
 |date_label|value|
 +----------+-----+
 |       Jan|19.75|
 |       Feb|15.51|
 +----------+-----+