Python 3.x 使用numpy.npv函数的输出向数据帧添加列_Python 3.x_Numpy_Pyspark

Python 3.x 使用numpy.npv函数的输出向数据帧添加列

python-3.x numpy pyspark

Python 3.x 使用numpy.npv函数的输出向数据帧添加列,python-3.x,numpy,pyspark,Python 3.x,Numpy,Pyspark,我试图在pyspark数据框中使用2列，使用numpy.npv（）函数计算净现值返回（值/（1+比率）**np.arange（0，len（值）））.sum（轴=0） TypeError:未调整大小的对象的len（）我还尝试将numpy.npv函数用作udf，但没有成功请求帮助以解决此问题 # Creating the DataFrame df = sc.parallelize([('a',1,100),('a',2,200),('a',3,300),('a',4,400), ('a',5

我试图在pyspark数据框中使用2列，使用numpy.npv（）函数计算净现值

返回（值/（1+比率）**np.arange（0，len（值）））.sum（轴=0） TypeError:未调整大小的对象的len（）

我还尝试将numpy.npv函数用作udf，但没有成功

请求帮助以解决此问题

# Creating the DataFrame
df = sc.parallelize([('a',1,100),('a',2,200),('a',3,300),('a',4,400), 
('a',5,500),('a',6,600),('b',1,23),('b',2,32),('b',3,34),('b',4,55), 
('b',5,43)]).toDF(['Name','yr','cash'])
df.show()

# Loading the requisite packages
from pyspark.sql import Window
from pyspark.sql.functions import col, collect_list
import numpy as np
w = (Window.partitionBy('Name').orderBy(col('yr').desc()).rangeBetween(Window.unboundedPreceding, 0))

df = df.withColumn('cash_list', collect_list('cash').over(w))    
df.show(truncate=False)
df = df.withColumn('discount_rate', lit(0.3))

#calculate npv
df = df.withColumn('npv_value', np.npv(df.discount_rate, df.cash_list))

使用

OP

提供的代码，我们得到以下

数据帧

df.show(truncate=False)
+----+---+----+------------------------------+-------------+
|Name|yr |cash|cash_list                     |discount_rate|
+----+---+----+------------------------------+-------------+
|b   |5  |43  |[43]                          |0.3          |
|b   |4  |55  |[43, 55]                      |0.3          |
|b   |3  |34  |[43, 55, 34]                  |0.3          |
|b   |2  |32  |[43, 55, 34, 32]              |0.3          |
|b   |1  |23  |[43, 55, 34, 32, 23]          |0.3          |
|a   |6  |600 |[600]                         |0.3          |
|a   |5  |500 |[600, 500]                    |0.3          |
|a   |4  |400 |[600, 500, 400]               |0.3          |
|a   |3  |300 |[600, 500, 400, 300]          |0.3          |
|a   |2  |200 |[600, 500, 400, 300, 200]     |0.3          |
|a   |1  |100 |[600, 500, 400, 300, 200, 100]|0.3          |
+----+---+----+------------------------------+-------------+

OP想要计算，为此他想要使用

UDF

。对于

Name=a yr=1

净现值如下-

600/（1.3）^5+500/（1.3）^4+400/（1.3）^3+300/（1.3）^2+200/（1.3）^1+100/（1.3）

比如说，对于

Name=1年=1

，您想要600/（1.3）**5+500/（1.3）**4+…200/（1.3）**1+100/（1.3）**0吗？还是另一个在附近？顺便说一句，你不能这样使用np.npv。您必须使用

UDF

进行此操作。正确！这就是我正在努力实现的目标。我以为NPV功能会帮我做到这一点。但很明显，我已经走远了。我试过UDF的方法。将函数重新编写为@udf（returnType=DoubleType（））def calc_npv_value（rate，values）：values=np.asarray（values）return（values/（1+rate）**np.arange（1，len（values）+1））.sum（axis=0），但仍会出错。如果您能协助UDF方法，将非常有帮助。非常感谢。这很有效。我会努力找出我做错了什么。减去列表的倒转，我也差不多做了同样的事情。我看到的一个区别是UDF的输入是（折扣率、现金清单）。这是您给我的一些非常有用的学习。非常感谢和感谢！我希望你一切顺利。

# Creating a function and it's corresponding UDF
from pyspark.sql.functions import udf
def calculate_npv(cash_list,rate):
   # Reverse the List
   cash_list = cash_list[::-1]
   return float(np.npv(rate,cash_list))
calculate_npv = udf(calculate_npv,FloatType())

# Applying the UDF to the DataFrame below
df = df.withColumn('NPV',calculate_npv('cash_list','discount_rate'))
df.show(truncate=False)
+----+---+----+------------------------------+-------------+----------+
|Name|yr |cash|cash_list                     |discount_rate|NPV       |
+----+---+----+------------------------------+-------------+----------+
|b   |5  |43  |[43]                          |0.3          |43.0      |
|b   |4  |55  |[43, 55]                      |0.3          |88.07692  |
|b   |3  |34  |[43, 55, 34]                  |0.3          |101.75148 |
|b   |2  |32  |[43, 55, 34, 32]              |0.3          |110.27037 |
|b   |1  |23  |[43, 55, 34, 32, 23]          |0.3          |107.823364|
|a   |6  |600 |[600]                         |0.3          |600.0     |
|a   |5  |500 |[600, 500]                    |0.3          |961.53845 |
|a   |4  |400 |[600, 500, 400]               |0.3          |1139.645  |
|a   |3  |300 |[600, 500, 400, 300]          |0.3          |1176.65   |
|a   |2  |200 |[600, 500, 400, 300, 200]     |0.3          |1105.1154 |
|a   |1  |100 |[600, 500, 400, 300, 200, 100]|0.3          |950.08875 |
+----+---+----+------------------------------+-------------+----------+