Python 将spark数据帧转换为每行的列表_Python_Pandas_Apache Spark_Apache Spark Sql

Python 将spark数据帧转换为每行的列表

python pandas apache-spark

Python 将spark数据帧转换为每行的列表,python,pandas,apache-spark,apache-spark-sql,Python,Pandas,Apache Spark,Apache Spark Sql,当前正在接收一个spark数据帧，并已将其转换为pandas数据帧以生成行列表。我想创建列表而不将其放入数据框。Function2将函数应用于每行的字符串表示形式。列名将不是常量 def func1（df）： df=df.select（“*”）.toPandas（） job_args=[（“，”。为范围（0，len（df））中的c加入（列表中的i的str（i）（过滤器（无.u ne，df.iloc[c].tolist（））））结果=spark.sparkContext.parallelize

当前正在接收一个spark数据帧，并已将其转换为pandas数据帧以生成行列表。我想创建列表而不将其放入数据框。Function2将函数应用于每行的字符串表示形式。列名将不是常量


def func1（df）：
df=df.select（“*”）.toPandas（）
job_args=[（“，”。为范围（0，len（df））中的c加入（列表中的i的str（i）（过滤器（无.u ne，df.iloc[c].tolist（））））
结果=spark.sparkContext.parallelize（作业参数）.map（lambda n:function2（n））.collect（）
返回结果

例如：

+-----+-----+
|index|count|
+-----+-----+
|  1  |  5  |
|  2  |  9  |
|  3  |  3  |
|  4  |  1  |

变成

rows[0] = [1,5]
rows[1] = [2,9]
rows[2] = [3,3]
rows[3] = [4,1]

如果目标是获取spark数据帧中的所有列并将它们连接到字符串，则可以使用以下两个步骤：

使用函数创建一个新列并将所有列放入

使用函数将元素连接到单个字符串

下面是一个如何执行此操作的工作示例：

import pyspark.sql.functions as f

l = [(1, 5), (2, 9), (3, 3), (4, 1)]
df = spark.createDataFrame(l, ['index', 'count'])

(
  df
  .withColumn('arr', f.array(df.columns))
  .withColumn('str', f.array_join('arr', ', '))
  .select('str')
).show()

+----+
| str|
+----+
|1, 5|
|2, 9|
|3, 3|
|4, 1|
+----+