Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将变量传递给Spark DF lit函数_Python_Dataframe_Pyspark - Fatal编程技术网

Python 将变量传递给Spark DF lit函数

Python 将变量传递给Spark DF lit函数,python,dataframe,pyspark,Python,Dataframe,Pyspark,我正在尝试向现有spark df添加一个新列。如果我将df列名指定为新列的新值,那么它将不起作用,但由于我希望值列基于配置是动态的,所以我希望从变量传递值 e、 g: 如果我使用df2=df1.withColumn(“COL_D”,lit(df1.COL_A)),那么它会按预期工作。 然而,若我有一个变量并试图传递它,那个么它就不起作用了 val_col = "COL_B" df2=df1.withColumn(“COL_D”,lit(df1.val_COL)) 我不确定这是否可行,但想问问。

我正在尝试向现有spark df添加一个新列。如果我将df列名指定为新列的新值,那么它将不起作用,但由于我希望值列基于配置是动态的,所以我希望从变量传递值

e、 g:

如果我使用
df2=df1.withColumn(“COL_D”,lit(df1.COL_A))
,那么它会按预期工作。 然而,若我有一个变量并试图传递它,那个么它就不起作用了

val_col = "COL_B"
df2=df1.withColumn(“COL_D”,lit(df1.val_COL))


我不确定这是否可行,但想问问。如果有人以前做过类似的事情,请告诉我。

使用
col
功能来避免此问题

df = sqlContext.createDataFrame([(1,'Björn'),(2,'Oliver'),(3,'Müller')],['ID','Name']) 
df.show() 
+---+------+
| ID|  Name|
+---+------+
|  1| Björn|
|  2|Oliver|
|  3|Müller|
+---+------+                                  
df1 = df.withColumn('New_ID',lit(df.ID))
df1.show()
+---+------+------+
| ID|  Name|New_ID|
+---+------+------+
|  1| Björn|     1|
|  2|Oliver|     2|
|  3|Müller|     3|
+---+------+------+
到目前为止还不错。但是,当我们为变量指定列名时,我们会得到一个错误,如下所示-

val_col = "ID"
df1 = df.withColumn('New_ID',lit(df.val_col))

AttributeErrorTraceback (most recent call last)
<ipython-input-48-1bb287cfa9f2> in <module>
      5 
      6 val_col = "ID"
----> 7 df1 = df.withColumn('New_ID',lit(df.val_col))
      8 
      9 from pyspark.sql.functions import col

/opt/mapr/spark/spark-2.2.1/python/pyspark/sql/dataframe.py in __getattr__(self, name)
   1018         if name not in self.columns:
   1019             raise AttributeError(
-> 1020                 "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
   1021         jc = self._jdf.apply(name)
   1022         return Column(jc)

AttributeError: 'DataFrame' object has no attribute 'val_col'

我得到以下错误:
AttributeError:'function'对象没有属性'\u get\u object\u id'
这是什么意思?
val_col = "ID"
df1 = df.withColumn('New_ID',lit(df.val_col))

AttributeErrorTraceback (most recent call last)
<ipython-input-48-1bb287cfa9f2> in <module>
      5 
      6 val_col = "ID"
----> 7 df1 = df.withColumn('New_ID',lit(df.val_col))
      8 
      9 from pyspark.sql.functions import col

/opt/mapr/spark/spark-2.2.1/python/pyspark/sql/dataframe.py in __getattr__(self, name)
   1018         if name not in self.columns:
   1019             raise AttributeError(
-> 1020                 "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
   1021         jc = self._jdf.apply(name)
   1022         return Column(jc)

AttributeError: 'DataFrame' object has no attribute 'val_col'
from pyspark.sql.functions import col
val_col = "ID"
df1 = df.withColumn('New_ID',lit(col(val_col)))
df1.show()
+---+------+------+
| ID|  Name|New_ID|
+---+------+------+
|  1| Björn|     1|
|  2|Oliver|     2|
|  3|Müller|     3|
+---+------+------+