Python 在数据帧中将一行分解为多行_Python_Pandas_Explode_Reshape_Melt

Python 在数据帧中将一行分解为多行

python pandas

Python 在数据帧中将一行分解为多行,python,pandas,explode,reshape,melt,Python,Pandas,Explode,Reshape,Melt,我有一个具有以下标题的数据帧： id, type1, ..., type10, location1, ..., location10 我想把它转换成如下： id, type, location 我使用嵌入式for循环成功地做到了这一点，但速度非常慢： new_format_columns = ['ID', 'type', 'location'] new_format_dataframe = pd.DataFrame(columns=new_format_columns) print

我有一个具有以下标题的数据帧：

id, type1, ..., type10, location1, ..., location10

我想把它转换成如下：

id, type, location

我使用嵌入式for循环成功地做到了这一点，但速度非常慢：

new_format_columns = ['ID', 'type', 'location'] 
new_format_dataframe = pd.DataFrame(columns=new_format_columns)



print(data.head())
new_index = 0 
for index, row in data.iterrows(): 
        ID = row["ID"]

        for i in range(1,11):
                if row["type"+str(i)] == np.nan:
                        continue
                else:
                        new_row = pd.Series([ID, row["type"+str(i)], row["location"+str(i)]])
                        new_format_dataframe.loc[new_index] = new_row.values
                        new_index += 1

使用本地熊猫功能有什么改进建议吗

您可以使用

lreshape

：

types = [col for col in df.columns if col.startswith('type')]
location = [col for col in df.columns if col.startswith('location')]

print(pd.lreshape(df, {'Type':types, 'Location':location}, dropna=False))

样本：

import pandas as pd

df = pd.DataFrame({
'type1': {0: 1, 1: 4}, 
'id': {0: 'a', 1: 'a'}, 
'type10': {0: 1, 1: 8},
'location1': {0: 2, 1: 9},
'location10': {0: 5, 1: 7}})

print (df)
  id  location1  location10  type1  type10
0  a          2           5      1       1
1  a          9           7      4       8

types = [col for col in df.columns if col.startswith('type')]
location = [col for col in df.columns if col.startswith('location')]

print(pd.lreshape(df, {'Type':types, 'Location':location}, dropna=False))
  id  Location  Type
0  a         2     1
1  a         9     4
2  a         5     1
3  a         7     8

另一个具有双重功能的解决方案：

编辑：

现在未记录，但将来可能会被删除（）

可能的解决方案是将所有3个函数合并为一个-可能是

melt

，但现在还没有实现。也许是新版本的熊猫。然后我的答案将被更新。

您的数据集有多大？@MMF目前只有几GB

print (pd.concat([pd.melt(df, id_vars='id', value_vars=types, value_name='type'),
                  pd.melt(df, value_vars=location, value_name='Location')], axis=1)
         .drop('variable', axis=1))

  id  type  Location
0  a     1         2
1  a     4         9
2  a     1         5
3  a     8         7