Python 从dict_值创建pyspark数据帧_Python_Python 3.x_Pandas_Dictionary_Pyspark

Python 从dict_值创建pyspark数据帧

python python-3.x pandas dictionary pyspark

Python 从dict_值创建pyspark数据帧,python,python-3.x,pandas,dictionary,pyspark,Python,Python 3.x,Pandas,Dictionary,Pyspark,我试图用dict_值生成pyspark数据帧。我可以使用pandas concate函数实现同样的功能。字典由年份键和pyspark数据框值组成这是我正在使用的代码，我有一个替代方案来联合所有的数据帧，我认为这不是更好的实现方法 dict_ym = {} for yearmonth in keys: key_name = 'df_'+str(yearmonth) dict_ym[key_name]= df # Add a new column to datafr

我试图用dict_值生成pyspark数据帧。我可以使用pandas concate函数实现同样的功能。字典由年份键和pyspark数据框值组成

这是我正在使用的代码，我有一个替代方案来联合所有的数据帧，我认为这不是更好的实现方法

dict_ym = {}
for yearmonth in keys:    
    key_name = 'df_'+str(yearmonth)
    dict_ym[key_name]= df
    # Add a new column to dataframe
    # Perform some more transformation

dict_ym 

# Now above dict has key as yearmonth for eg. 201501 and value as dataframe consit of 10 columns

def union_all_dataframes(*dfs):
    return reduce(DataFrame.unionAll, dfs)

df2 = union_all_dataframes(dict_ym['df_201501'],dict_ym['df_201502'] ... so on till dict_ym['df_201709'])

但在pandas dataframe中，我可以这样做，使用下面的代码集将下面的所有数据帧附加到其他数据帧：

 df2 = pd.concat(dict_ym.values()) # here dict_ym has pandas dataframe in case of spark df

我认为他们会更优雅地创建pyspark数据框架以及类似pandas.concat的数据框架

试试这个

df2 = union_all_dataframes(*dict_ym.values())