Python 使用两列列表透视数据帧_Python_Pandas_Pivot

Python 使用两列列表透视数据帧

python pandas

Python 使用两列列表透视数据帧,python,pandas,pivot,Python,Pandas,Pivot,我有一个数据帧，如： matrix = [(222, ['A','B','C'], [1,2,3]), (333, ['A','B','D'], [1,3,5])] df = pd.DataFrame(matrix, columns=['timestamp', 'variable', 'value']) 并且希望将其旋转，以便保留时间戳值，变量列中的唯一值成为附加列，并且值中的值在相应列中排序输出应如下所示： timestamp A B C D

我有一个数据帧，如：

matrix = [(222, ['A','B','C'], [1,2,3]),
         (333, ['A','B','D'], [1,3,5])]

df = pd.DataFrame(matrix, columns=['timestamp', 'variable', 'value'])

并且希望将其旋转，以便保留

时间戳

值，

变量

列中的唯一值成为附加列，并且

值

中的值在相应列中排序

输出应如下所示：

timestamp   A    B    C    D 

222         1    2    3    nan
333         1    3    nan  5

任何帮助都将不胜感激！：）

使用zip创建字典，传递到

DataFrame

构造函数：

a = [dict(zip(*x)) for x in zip(df['variable'], df['value'])]
print (a)
[{'A': 1, 'B': 2, 'C': 3}, {'A': 1, 'B': 3, 'D': 5}]

df = df[['timestamp']].join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

如果许多其他列用于提取列：

a = [dict(zip(*x)) for x in zip(df.pop('variable'), df.pop('value'))]

df = df.join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

您可以将值和列名传递给pd.Series构造函数。这将自动展开所需形状中的值

df.set_index('timestamp').apply(lambda row: pd.Series(row.value, index=row.variable), axis=1)

# outputs:
             A    B    C    D
timestamp
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

先使用，然后只使用

pivot

unnesting(df,['variable','value']).pivot(*df.columns)
Out[79]: 
variable     A    B    C    D
timestamp                    
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

砰-就是这样！谢谢

unnesting(df,['variable','value']).pivot(*df.columns)
Out[79]: 
variable     A    B    C    D
timestamp                    
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')