Pyspark:如何用数组中的值替换每行的值
我将用monthList数组中的值更改date列中的数字 月表阵列 月列表=[无,一月、二月、三月、四月、五月、六月、七月、八月、九月、十月、十一月、十二月] Pypark代码 结果 我会试试这个,但它不起作用。错误“DataFrame”对象没有属性“apply”Pyspark:如何用数组中的值替换每行的值,pyspark,apache-spark-sql,Pyspark,Apache Spark Sql,我将用monthList数组中的值更改date列中的数字 月表阵列 月列表=[无,一月、二月、三月、四月、五月、六月、七月、八月、九月、十月、十一月、十二月] Pypark代码 结果 我会试试这个,但它不起作用。错误“DataFrame”对象没有属性“apply” 感谢您的帮助。一种方法是使用monthlist预先创建dataframedate\u查找。此df可以广播以提高性能。然后您可以使用实际df执行左连接 from pyspark.sql import SparkSession from
感谢您的帮助。一种方法是使用monthlist预先创建dataframedate\u查找。此df可以广播以提高性能。然后您可以使用实际df执行左连接
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
spark = SparkSession.builder \
.appName('practice')\
.getOrCreate()
sc= spark.sparkContext
monthList = ["None","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
lookup_list = []
for i in range(len(monthList)):
lookup_list.append((i,monthList[i]))
date_lookup = sc.parallelize(lookup_list).toDF(["date_num", "date_label"])
date_lookup.show()
+--------+----------+
|date_num|date_label|
+--------+----------+
| 0| None|
| 1| Jan|
| 2| Feb|
| 3| Mar|
| 4| Apr|
| 5| May|
| 6| Jun|
| 7| Jul|
| 8| Aug|
| 9| Sep|
| 10| Oct|
| 11| Nov|
| 12| Dec|
+--------+----------+
df= sc.parallelize([
(1,19.75), (2,15.51)]).toDF(["date", "value"])
+----+-----+
|date|value|
+----+-----+
| 1|19.75|
| 2|15.51|
+----+-----+
df1 = df.join(F.broadcast(date_lookup),df.date==date_lookup.date_num, how='left').select('date_label','value')
df1.show()
+----------+-----+
|date_label|value|
+----------+-----+
| Jan|19.75|
| Feb|15.51|
+----------+-----+
一种方法是使用monthlist预先创建dataframedate_查找。此df可以广播以提高性能。然后您可以使用实际df执行左连接
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
spark = SparkSession.builder \
.appName('practice')\
.getOrCreate()
sc= spark.sparkContext
monthList = ["None","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
lookup_list = []
for i in range(len(monthList)):
lookup_list.append((i,monthList[i]))
date_lookup = sc.parallelize(lookup_list).toDF(["date_num", "date_label"])
date_lookup.show()
+--------+----------+
|date_num|date_label|
+--------+----------+
| 0| None|
| 1| Jan|
| 2| Feb|
| 3| Mar|
| 4| Apr|
| 5| May|
| 6| Jun|
| 7| Jul|
| 8| Aug|
| 9| Sep|
| 10| Oct|
| 11| Nov|
| 12| Dec|
+--------+----------+
df= sc.parallelize([
(1,19.75), (2,15.51)]).toDF(["date", "value"])
+----+-----+
|date|value|
+----+-----+
| 1|19.75|
| 2|15.51|
+----+-----+
df1 = df.join(F.broadcast(date_lookup),df.date==date_lookup.date_num, how='left').select('date_label','value')
df1.show()
+----------+-----+
|date_label|value|
+----------+-----+
| Jan|19.75|
| Feb|15.51|
+----------+-----+
你可以试试这个选择- df1.selectExpr 选择“无”、“一月”、“二月”、“三月”、“四月”、“五月”、“六月”、“七月”、“八月”、“九月”、“十月”、“十一月”、“十二月”,日期为日期, 价值 .showfalse /** * +--+---+ *|日期|值| * +--+---+ *|无| 19.75| *| 1月15日| 51| *| 2月20日| 66| * +--+---+ */
你可以试试这个选择- df1.selectExpr 选择“无”、“一月”、“二月”、“三月”、“四月”、“五月”、“六月”、“七月”、“八月”、“九月”、“十月”、“十一月”、“十二月”,日期为日期, 价值 .showfalse /** * +--+---+ *|日期|值| * +--+---+ *|无| 19.75| *| 1月15日| 51| *| 2月20日| 66| * +--+---+ */
d.date = d.select('date').apply(lambda x: monthList[x])
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
spark = SparkSession.builder \
.appName('practice')\
.getOrCreate()
sc= spark.sparkContext
monthList = ["None","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
lookup_list = []
for i in range(len(monthList)):
lookup_list.append((i,monthList[i]))
date_lookup = sc.parallelize(lookup_list).toDF(["date_num", "date_label"])
date_lookup.show()
+--------+----------+
|date_num|date_label|
+--------+----------+
| 0| None|
| 1| Jan|
| 2| Feb|
| 3| Mar|
| 4| Apr|
| 5| May|
| 6| Jun|
| 7| Jul|
| 8| Aug|
| 9| Sep|
| 10| Oct|
| 11| Nov|
| 12| Dec|
+--------+----------+
df= sc.parallelize([
(1,19.75), (2,15.51)]).toDF(["date", "value"])
+----+-----+
|date|value|
+----+-----+
| 1|19.75|
| 2|15.51|
+----+-----+
df1 = df.join(F.broadcast(date_lookup),df.date==date_lookup.date_num, how='left').select('date_label','value')
df1.show()
+----------+-----+
|date_label|value|
+----------+-----+
| Jan|19.75|
| Feb|15.51|
+----------+-----+