Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/317.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/apache-flex/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将数据帧中嵌入的行RDD转换为列表_Python_Dataframe_Pyspark_Rdd - Fatal编程技术网

Python 将数据帧中嵌入的行RDD转换为列表

Python 将数据帧中嵌入的行RDD转换为列表,python,dataframe,pyspark,rdd,Python,Dataframe,Pyspark,Rdd,我有如图所示的Dataframe用户推荐的。建议列是PySpark RDD,如下所示: In[10]: user_recommended.recommendations[0] Out[10]: [Row(item=0, rating=0.005226806737482548), Row(item=23, rating=0.0044402251951396465), Row(item=4, rating=0.004139747936278582)] 我想将推荐

我有如图所示的Dataframe
用户推荐的
建议
列是PySpark RDD,如下所示:

In[10]: user_recommended.recommendations[0]
Out[10]: [Row(item=0, rating=0.005226806737482548),
         Row(item=23, rating=0.0044402251951396465),
         Row(item=4, rating=0.004139747936278582)]
我想将
推荐
RDD转换为Python列表


是否有脚本可以帮助我将
用户推荐的
数据帧中的
推荐
列(请注意,它的类型为
pandas.core.frame.Dataframe
)转换为列表。

我想您希望这样做

from pyspark.sql import Row

my_rdd = sc.parallelize([Row(item=0, rating=0.005226806737482548),
         Row(item=23, rating=0.0044402251951396465),
         Row(item=4, rating=0.004139747936278582)])
my_rdd.collect()
new_rdd = my_rdd.map(lambda x: (x[0], x[1]))
new_rdd.collect()

另一种略有不同的方法。在我看来,这样做的价值在于,它可以更容易地推广到包含2个以上元素的
行。另外,值得注意的是,您在问题中预览的数据结构是一个包含PySpark
数据结构列表的列,实际上不是RDD

from pyspark.sql import Row

# recreate the individual entries of the recommendation column
# these are lists of pyspark Row data structures
df_recommend = pd.DataFrame({'recommendations': (
[Row(item=0, rating=0.005226806737482548),
         Row(item=23, rating=0.0044402251951396465),
         Row(item=4, rating=0.004139747936278582)],)})

# now extract the values using the asDict method of the Row 
df_recommend['extracted_values'] = (
    df_recommend['recommendations']
    .apply(lambda recs: [list(x.asDict().values()) for x in recs])
)

请尝试用户建议。建议[0]。tolist()是否可以共享流程的示例输入和输出,问题尚不清楚。