Python 合并多个数据帧_Python_Pandas_Dataframe

Python 合并多个数据帧

python pandas dataframe

Python 合并多个数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,这可能被认为是的重复，但是由于数据帧数量较多，我似乎无法找到解决问题的方法我有多个数据帧（超过10个），每个帧在一列中不同。这只是一个简单的例子： import pandas as pd df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000], 'VAR1': [38.196202, 38.198002, 38.200001], 'profile': ['profile_1', 'profile_1'

这可能被认为是的重复，但是由于数据帧数量较多，我似乎无法找到解决问题的方法

我有多个数据帧（超过10个），每个帧在一列中不同。这只是一个简单的例子：

import pandas as pd

df1 = pd.DataFrame({'depth': [0.500000, 0.600000, 1.300000],
       'VAR1': [38.196202, 38.198002, 38.200001],
       'profile': ['profile_1', 'profile_1','profile_1']})

df2 = pd.DataFrame({'depth': [0.600000, 1.100000, 1.200000],
       'VAR2': [0.20440, 0.20442, 0.20446],
       'profile': ['profile_1', 'profile_1','profile_1']})

df3 = pd.DataFrame({'depth': [1.200000, 1.300000, 1.400000],
       'VAR3': [15.1880, 15.1820, 15.1820],
       'profile': ['profile_1', 'profile_1','profile_1']})

每个

df

对于相同的剖面具有相同或不同的深度，因此

我需要创建一个新的数据框，它将合并所有单独的数据框，其中操作的关键列是

深度

和

配置文件

，每个配置文件的全部显示深度值

因此，

VARX

值应为

NaN

，其中该剖面没有该变量的深度测量

因此，结果应该是一个新的压缩数据帧，所有

VARX

都作为

depth

和

profile

列的附加列，如下所示：

name_profile    depth   VAR1        VAR2        VAR3
profile_1   0.500000    38.196202   NaN         NaN
profile_1   0.600000    38.198002   0.20440     NaN
profile_1   1.100000    NaN         0.20442     NaN
profile_1   1.200000    NaN         0.20446     15.1880
profile_1   1.300000    38.200001   NaN         15.1820
profile_1   1.400000    NaN         NaN         15.1820

请注意，配置文件的实际数量要大得多

有什么想法吗？

考虑在每个数据帧上设置索引，然后使用

pd.concat运行水平合并：
dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
#      profile  depth       VAR1     VAR2    VAR3
# 0  profile_1    0.5  38.198002      NaN     NaN
# 1  profile_1    0.6  38.198002  0.20440     NaN
# 2  profile_1    1.1        NaN  0.20442     NaN
# 3  profile_1    1.2        NaN  0.20446  15.188
# 4  profile_1    1.3  38.200001      NaN  15.182
# 5  profile_1    1.4        NaN      NaN  15.182

考虑在每个数据帧上设置索引，然后使用pd.concat
运行水平合并：
dfs = [df.set_index(['profile', 'depth']) for df in [df1, df2, df3]]

print(pd.concat(dfs, axis=1).reset_index())
#      profile  depth       VAR1     VAR2    VAR3
# 0  profile_1    0.5  38.198002      NaN     NaN
# 1  profile_1    0.6  38.198002  0.20440     NaN
# 2  profile_1    1.1        NaN  0.20442     NaN
# 3  profile_1    1.2        NaN  0.20446  15.188
# 4  profile_1    1.3  38.200001      NaN  15.182
# 5  profile_1    1.4        NaN      NaN  15.182

一个简单的方法是结合使用
首先，partial
允许“冻结”函数参数和/或关键字的某些部分，从而生成具有简化签名的新对象。然后，使用reduce
我们可以将新的部分对象累积应用于iterable的项（此处的数据帧列表）：
一个简单的方法是结合使用
首先，partial
允许“冻结”函数参数和/或关键字的某些部分，从而生成具有简化签名的新对象。然后，使用reduce
我们可以将新的部分对象累积应用于iterable的项（此处的数据帧列表）：
我会使用append
>>> df1.append(df2).append(df3).sort_values('depth')

        VAR1     VAR2    VAR3  depth    profile
0  38.196202      NaN     NaN    0.5  profile_1
1  38.198002      NaN     NaN    0.6  profile_1
0        NaN  0.20440     NaN    0.6  profile_1
1        NaN  0.20442     NaN    1.1  profile_1
2        NaN  0.20446     NaN    1.2  profile_1
0        NaN      NaN  15.188    1.2  profile_1
2  38.200001      NaN     NaN    1.3  profile_1
1        NaN      NaN  15.182    1.3  profile_1
2        NaN      NaN  15.182    1.4  profile_1

显然，如果您有很多数据帧，只需创建一个列表并循环遍历它们。
我会使用append
>>> df1.append(df2).append(df3).sort_values('depth')

        VAR1     VAR2    VAR3  depth    profile
0  38.196202      NaN     NaN    0.5  profile_1
1  38.198002      NaN     NaN    0.6  profile_1
0        NaN  0.20440     NaN    0.6  profile_1
1        NaN  0.20442     NaN    1.1  profile_1
2        NaN  0.20446     NaN    1.2  profile_1
0        NaN      NaN  15.188    1.2  profile_1
2  38.200001      NaN     NaN    1.3  profile_1
1        NaN      NaN  15.182    1.3  profile_1
2        NaN      NaN  15.182    1.4  profile_1

显然，如果您有很多数据帧，只需制作一个列表并循环使用它们。
为什么不将所有数据帧连接起来，熔化，然后使用您的ID重新组合它们？也许有一种更有效的方法可以做到这一点，但这是可行的
df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

其中，df_轴

variable              VAR1     VAR2    VAR3
profile   depth                            
profile_1 0.5    38.196202      NaN     NaN
          0.6    38.198002  0.20440     NaN
          1.1          NaN  0.20442     NaN
          1.2          NaN  0.20446  15.188
          1.3    38.200001      NaN  15.182
          1.4          NaN      NaN  15.182

为什么不连接所有的数据帧，熔化，然后用你的ID重新组合它们呢？也许有一种更有效的方法可以做到这一点，但这是可行的
df=pd.melt(pd.concat([df1,df2,df3]),id_vars=['profile','depth'])
df_pivot=df.pivot_table(index=['profile','depth'],columns='variable',values='value')

其中，df_轴

variable              VAR1     VAR2    VAR3
profile   depth                            
profile_1 0.5    38.196202      NaN     NaN
          0.6    38.198002  0.20440     NaN
          1.1          NaN  0.20442     NaN
          1.2          NaN  0.20446  15.188
          1.3    38.200001      NaN  15.182
          1.4          NaN      NaN  15.182

您还可以使用：
dfs = [df1, df2, df3]
df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
for d in dfs[2:]:
    df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

   depth       VAR1    profile     VAR2    VAR3
0    0.5  38.196202  profile_1      NaN     NaN
1    0.6  38.198002  profile_1  0.20440     NaN
2    1.3  38.200001  profile_1      NaN  15.182
3    1.1        NaN  profile_1  0.20442     NaN
4    1.2        NaN  profile_1  0.20446  15.188
5    1.4        NaN  profile_1      NaN  15.182

您还可以使用：
dfs = [df1, df2, df3]
df = pd.merge(dfs[0], dfs[1], left_on=['depth','profile'], right_on=['depth','profile'], how='outer')
for d in dfs[2:]:
    df = pd.merge(df, d, left_on=['depth','profile'], right_on=['depth','profile'], how='outer')

   depth       VAR1    profile     VAR2    VAR3
0    0.5  38.196202  profile_1      NaN     NaN
1    0.6  38.198002  profile_1  0.20440     NaN
2    1.3  38.200001  profile_1      NaN  15.182
3    1.1        NaN  profile_1  0.20442     NaN
4    1.2        NaN  profile_1  0.20446  15.188
5    1.4        NaN  profile_1      NaN  15.182

谢谢大家!@BlivetWidget，您如何按深度和配置文件对其进行排序？每个配置文件都有一组深度，每个数据帧都有一组配置文件？@PEBKAC您可以根据需要的参数数量对其进行排序，排序顺序可以是任何顺序。排序_值（['depth'，'profile']）或.sort_值（['profile'，'depth']）。您可以查看关于df1.sort_值的帮助，了解如何更改排序顺序、就地排序以及各种其他可选参数。谢谢@BlivetWidget，您如何按深度和配置文件对其进行排序？每个配置文件都有一组深度，每个数据帧都有一组配置文件？@PEBKAC您可以根据需要的参数数量对其进行排序，排序顺序可以是任何顺序。排序_值（['depth'，'profile']）或.sort_值（['profile'，'depth']）。您可以查看有关df1.sort_值的帮助，了解如何更改排序顺序、就地排序以及各种其他可选参数。请参阅“通用化：合并多个数据帧”部分，其中介绍了一个部分解决方案。如果这没有帮助，请让我知道我可以如何改善后，使它更清楚。谢谢看看“通用化：合并多个数据帧”一节，这里解释了一个部分解决方案。如果这没有帮助，请让我知道我可以如何改善后，使它更清楚。谢谢也可以保留列名吗？当我使用这个时，列名就消失了。是否可以保留列名？当我使用这个时，列名就消失了。