Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/asp.net-mvc/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
查找PySpark中window.partitionBy上提取最小值的行值_Pyspark_Apache Spark Sql_Window_Pyspark Dataframes_Partition By - Fatal编程技术网

查找PySpark中window.partitionBy上提取最小值的行值

查找PySpark中window.partitionBy上提取最小值的行值,pyspark,apache-spark-sql,window,pyspark-dataframes,partition-by,Pyspark,Apache Spark Sql,Window,Pyspark Dataframes,Partition By,我有一个PySpark数据帧,如下所示: +--------+-------------+--------------+-----------------------+ |material|purchase_date|mkt_prc_usd_lb|min_mkt_prc_over_1month| +--------+-------------+--------------+-----------------------+ | Copper| 2019-01-09| 2.694

我有一个PySpark数据帧,如下所示:

+--------+-------------+--------------+-----------------------+
|material|purchase_date|mkt_prc_usd_lb|min_mkt_prc_over_1month|
+--------+-------------+--------------+-----------------------+
|  Copper|   2019-01-09|        2.6945|                 2.6838|
|  Copper|   2019-01-23|        2.6838|                 2.6838|
|    Zinc|   2019-01-23|        1.1829|                 1.1829|
|    Zinc|   2019-06-26|        1.1918|                 1.1918|
|Aluminum|   2019-01-02|        0.8363|                 0.8342|
|Aluminum|   2019-01-09|        0.8342|                 0.8342|
|Aluminum|   2019-01-23|        0.8555|                 0.8342|
|Aluminum|   2019-04-03|        0.8461|                 0.8461|
+--------+-------------+--------------+-----------------------+
最后一列“min_mkt_prc_over_1 month”计算为材料一个月内第三列的最小“mkt_prc_usd_lb”,即材料、采购日期窗口上的-15天到+15天:

代码是:


w2 = (Window()
           .partitionBy("material")
           .orderBy(col("purchase_date").cast("timestamp").cast("long"))
           .rangeBetween(-days(15), days(15)))
现在,我想知道最低金额的“购买日期”是多少

预期输出:来自前两行

+--------+-------------+--------------+-----------------------+------------------+
|material|purchase_date|mkt_prc_usd_lb|min_mkt_prc_over_1month|date_of_min_price |
+--------+-------------+--------------+-----------------------+------------------+
|  Copper|   2019-01-09|        2.6945|                 2.6838|        2019-01-23|
|  Copper|   2019-01-23|        2.6838|                 2.6838|        2019-01-23|
+--------+-------------+--------------+-----------------------+------------------+

试试这个。我们可以在两个prc相同的地方创建一个列,用购买日期填充该列,否则为Null,然后我们可以使用窗口w2在新创建的列上使用ignoreNulls=True的First


试试这个。我们可以在两个prc相同的地方创建一个列,用购买日期填充该列,否则为Null,然后我们可以使用窗口w2在新创建的列上使用ignoreNulls=True的First

试试这个:df.用“d”列,F.exprminstructmkt\u prc\u usd\u lb作为min\u mkt\u prc\u超过1个月,购买日期作为min\u price的日期。overv2.选择*,d.*。drop'd.试试这个:df.用“d”列,F.exprminstructmkt\u prc\u usd\u lb作为min\u mkt\u prc\u超过1个月,购买日期作为min\u price的日期。overv2.选择*,d.*,drop'd'd'
from pyspark.sql.functions import *
from pyspark.sql.window import Window

days= lambda i: i * 86400
w2 = (Window()
           .partitionBy("material")
           .orderBy(col("purchase_date").cast("timestamp").cast("long"))
           .rangeBetween(-days(15), days(15)))


df.withColumn("first",\
              expr("""IF(mkt_prc_usd_lb=min_mkt_prc_over_1month,purchase_date,null)"""))\
  .withColumn("date_of_min_price", first("first", True).over(w2)).drop("first")\
  .show()

#+--------+-------------+--------------+-----------------------+-----------------+
#|material|purchase_date|mkt_prc_usd_lb|min_mkt_prc_over_1month|date_of_min_price|
#+--------+-------------+--------------+-----------------------+-----------------+
#|  Copper|   2019-01-09|        2.6945|                 2.6838|       2019-01-23|
#|  Copper|   2019-01-23|        2.6838|                 2.6838|       2019-01-23|
#|    Zinc|   2019-01-23|        1.1829|                 1.1829|       2019-01-23|
#|    Zinc|   2019-06-26|        1.1918|                 1.1918|       2019-06-26|
#|Aluminum|   2019-01-02|        0.8363|                 0.8342|       2019-01-09|
#|Aluminum|   2019-01-09|        0.8342|                 0.8342|       2019-01-09|
#|Aluminum|   2019-01-23|        0.8555|                 0.8342|       2019-01-09|
#|Aluminum|   2019-04-03|        0.8461|                 0.8461|       2019-04-03|
#+--------+-------------+--------------+-----------------------+-----------------+