Python 将元素为字典的列拆分为多个列_Python_Pandas

Python 将元素为字典的列拆分为多个列

python pandas

Python 将元素为字典的列拆分为多个列,python,pandas,Python,Pandas,我有一个包含字典作为元素的单列的DataFrame。这是以下代码的结果： dg # is a pandas dataframe with columns ID and VALUE. Many rows contain the same ID def seriesFeatures(series): """This functions receives a series of VALUE for the same ID and extracts tens of complex fe

我有一个包含字典作为元素的单列的

DataFrame

。这是以下代码的结果：

dg # is a pandas dataframe with columns ID and VALUE. Many rows contain the same ID

def seriesFeatures(series):
    """This functions receives a series of VALUE for the same ID and extracts
    tens of complex features from the series, storing them into a dictionary"""
    dico = dict()
    dico['feature1'] = calculateFeature1
    dico['feature2'] = calculateFeature2
    # Many more features
    dico['feature50'] = calculateFeature50
    return dico

grouped = dg.groupby(['ID'])
dh = grouped['VALUE'].agg( { 'all_features' : lambda s: seriesFeatures(s) } )
dh.reset_index()
# Here I get a dh DataFrame of a single column 'all_features' and
# dictionaries stored on its values. The keys are the feature's names

我需要将此

'all_features'

列拆分为尽可能多的列（我有太多的行和列，并且我无法更改

seriesFeatures

函数），因此输出将是一个包含列

ID

，

FEATURE1

，

FEATURE2

的数据帧，

功能3

<代码>功能50。这样做的最佳方式是什么

编辑一个具体而简单的例子：

dg = pd.DataFrame( [ [1,10] , [1,15] , [1,13] , [2,14] , [2,16] ] , columns=['ID','VALUE'] )

def seriesFeatures(series):
    dico = dict()
    dico['feature1'] = len(series)
    dico['feature2'] = series.sum()
    return dico

grouped = dg.groupby(['ID'])
dh = grouped['VALUE'].agg( { 'all_features' : lambda s: seriesFeatures(s) } )
dh.reset_index()

但当我尝试用pd.Series或pd.DataFrame包装它时，它说如果数据是标量值，则必须提供索引。提供index=['feature1'，'feature2']，我会得到奇怪的结果，例如使用：

dh=grouped['VALUE'].agg（{'all_features'：lambda s:pd.DataFrame（seriesfeactions，index=['feature1'，'feature2']）

我认为应该将dict打包成一个系列，然后这将在groupby调用中展开（但随后使用

apply

而不是

agg

，因为它不再是聚合（标量）结果）：

之后，可以将结果重塑为所需的格式

通过您的简单示例，这似乎是可行的：

In [22]: dh = grouped['VALUE'].apply(lambda x: pd.Series(seriesFeatures(x)))
In [23]: dh

Out[23]:
ID
1   feature1     3
    feature2    38
2   feature1     2
    feature2    30
dtype: int64

In [26]: dh.unstack().reset_index()
Out[26]:
   ID  feature1  feature2
0   1         3        38
1   2         2        30

谢谢你的示例案例！更新了我的答案。谢谢。我不知道这个

取消堆叠的事情，这似乎是一个很好的解决方案。
In [22]: dh = grouped['VALUE'].apply(lambda x: pd.Series(seriesFeatures(x)))
In [23]: dh

Out[23]:
ID
1   feature1     3
    feature2    38
2   feature1     2
    feature2    30
dtype: int64

In [26]: dh.unstack().reset_index()
Out[26]:
   ID  feature1  feature2
0   1         3        38
1   2         2        30