Python 3.x 使用numpy.npv函数的输出向数据帧添加列
我试图在pyspark数据框中使用2列,使用numpy.npv()函数计算净现值 返回(值/(1+比率)**np.arange(0,len(值))).sum(轴=0) TypeError:未调整大小的对象的len() 我还尝试将numpy.npv函数用作udf,但没有成功 请求帮助以解决此问题Python 3.x 使用numpy.npv函数的输出向数据帧添加列,python-3.x,numpy,pyspark,Python 3.x,Numpy,Pyspark,我试图在pyspark数据框中使用2列,使用numpy.npv()函数计算净现值 返回(值/(1+比率)**np.arange(0,len(值))).sum(轴=0) TypeError:未调整大小的对象的len() 我还尝试将numpy.npv函数用作udf,但没有成功 请求帮助以解决此问题 # Creating the DataFrame df = sc.parallelize([('a',1,100),('a',2,200),('a',3,300),('a',4,400), ('a',5
# Creating the DataFrame
df = sc.parallelize([('a',1,100),('a',2,200),('a',3,300),('a',4,400),
('a',5,500),('a',6,600),('b',1,23),('b',2,32),('b',3,34),('b',4,55),
('b',5,43)]).toDF(['Name','yr','cash'])
df.show()
# Loading the requisite packages
from pyspark.sql import Window
from pyspark.sql.functions import col, collect_list
import numpy as np
w = (Window.partitionBy('Name').orderBy(col('yr').desc()).rangeBetween(Window.unboundedPreceding, 0))
df = df.withColumn('cash_list', collect_list('cash').over(w))
df.show(truncate=False)
df = df.withColumn('discount_rate', lit(0.3))
#calculate npv
df = df.withColumn('npv_value', np.npv(df.discount_rate, df.cash_list))
使用
OP
提供的代码,我们得到以下数据帧
-
df.show(truncate=False)
+----+---+----+------------------------------+-------------+
|Name|yr |cash|cash_list |discount_rate|
+----+---+----+------------------------------+-------------+
|b |5 |43 |[43] |0.3 |
|b |4 |55 |[43, 55] |0.3 |
|b |3 |34 |[43, 55, 34] |0.3 |
|b |2 |32 |[43, 55, 34, 32] |0.3 |
|b |1 |23 |[43, 55, 34, 32, 23] |0.3 |
|a |6 |600 |[600] |0.3 |
|a |5 |500 |[600, 500] |0.3 |
|a |4 |400 |[600, 500, 400] |0.3 |
|a |3 |300 |[600, 500, 400, 300] |0.3 |
|a |2 |200 |[600, 500, 400, 300, 200] |0.3 |
|a |1 |100 |[600, 500, 400, 300, 200, 100]|0.3 |
+----+---+----+------------------------------+-------------+
OP想要计算,为此他想要使用UDF
。对于Name=a yr=1
净现值如下-
600/(1.3)^5+500/(1.3)^4+400/(1.3)^3+300/(1.3)^2+200/(1.3)^1+100/(1.3)
比如说,对于
Name=1年=1
,您想要600/(1.3)**5+500/(1.3)**4+…200/(1.3)**1+100/(1.3)**0吗?还是另一个在附近?顺便说一句,你不能这样使用np.npv。您必须使用UDF
进行此操作。正确!这就是我正在努力实现的目标。我以为NPV功能会帮我做到这一点。但很明显,我已经走远了。我试过UDF的方法。将函数重新编写为@udf(returnType=DoubleType())def calc_npv_value(rate,values):values=np.asarray(values)return(values/(1+rate)**np.arange(1,len(values)+1)).sum(axis=0),但仍会出错。如果您能协助UDF方法,将非常有帮助。非常感谢。这很有效。我会努力找出我做错了什么。减去列表的倒转,我也差不多做了同样的事情。我看到的一个区别是UDF的输入是(折扣率、现金清单)。这是您给我的一些非常有用的学习。非常感谢和感谢!我希望你一切顺利。
# Creating a function and it's corresponding UDF
from pyspark.sql.functions import udf
def calculate_npv(cash_list,rate):
# Reverse the List
cash_list = cash_list[::-1]
return float(np.npv(rate,cash_list))
calculate_npv = udf(calculate_npv,FloatType())
# Applying the UDF to the DataFrame below
df = df.withColumn('NPV',calculate_npv('cash_list','discount_rate'))
df.show(truncate=False)
+----+---+----+------------------------------+-------------+----------+
|Name|yr |cash|cash_list |discount_rate|NPV |
+----+---+----+------------------------------+-------------+----------+
|b |5 |43 |[43] |0.3 |43.0 |
|b |4 |55 |[43, 55] |0.3 |88.07692 |
|b |3 |34 |[43, 55, 34] |0.3 |101.75148 |
|b |2 |32 |[43, 55, 34, 32] |0.3 |110.27037 |
|b |1 |23 |[43, 55, 34, 32, 23] |0.3 |107.823364|
|a |6 |600 |[600] |0.3 |600.0 |
|a |5 |500 |[600, 500] |0.3 |961.53845 |
|a |4 |400 |[600, 500, 400] |0.3 |1139.645 |
|a |3 |300 |[600, 500, 400, 300] |0.3 |1176.65 |
|a |2 |200 |[600, 500, 400, 300, 200] |0.3 |1105.1154 |
|a |1 |100 |[600, 500, 400, 300, 200, 100]|0.3 |950.08875 |
+----+---+----+------------------------------+-------------+----------+