Python 如何在Pandas中融合两个数据帧_Python_Pandas

Python 如何在Pandas中融合两个数据帧

python pandas

Python 如何在Pandas中融合两个数据帧,python,pandas,Python,Pandas,我有两个数据帧： In [14]: rep1 Out[14]: x y z A 1 2 3 B 4 5 6 C 1 1 2 In [15]: rep2 Out[15]: x y z A 7 3 4 B 3 3 3 使用此代码创建： import pandas as pd rep1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]),('C',[1,1,2])], o

我有两个数据帧：

In [14]: rep1
Out[14]: 
   x  y  z
A  1  2  3
B  4  5  6
C  1  1  2

In [15]: rep2
Out[15]: 
   x  y  z
A  7  3  4
B  3  3  3

使用此代码创建：

import pandas as pd 
rep1 = pd.DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6]),('C',[1,1,2])], orient='index', columns=['x', 'y', 'z'])
rep2 = pd.DataFrame.from_items([('A', [7, 3, 4]), ('B', [3, 3, 3])], orient='index', columns=['x', 'y', 'z'])

然后，我要做的是对

rep1

和

rep2

进行网格划分，从而得到如下结果：

gene rep1 rep2 type
A     1    7    x
B     4    3    x
A     2    3    y
B     5    3    y
A     3    4    z
B     6    3    z

跳过C行，因为它不是由

rep1

和

rep2

共享的

我怎样才能做到这一点

>>> c1 = rep1.values.T.flatten()
>>> c2 = rep2.values.T.flatten()
>>> c3 = np.vstack((rep1.columns.values, rep2.columns.values)).T.flatten()
>>> pd.DataFrame(np.vstack((c1,c2,c3)).T)
   0  1  2
0  1  7  x
1  4  3  x
2  2  3  y
3  5  3  y
4  3  4  z
5  6  3  z

编辑：当我回答这个问题时，问题根本没有C行。现在事情更复杂了，但我还是把这个留在这里

编辑：当我回答这个问题时，问题根本没有C行。现在事情更复杂了，但我还是把它留在这里。

这样就可以了：

df =pd.concat([rep1.stack(),rep2.stack()],axis=1).reset_index().dropna()
df.columns =['GENE','TYPE','REP1','REP2']
df.sort(columns=['TYPE','GENE'], inplace=True)

将堆叠的数据帧连接在轴=1上。重置索引将返回gene和type列

dropna

负责基因c产生的空基。添加正确的列名等

   GENE TYPE REP1 REP2
0   A   x   1   7
3   B   x   4   3
1   A   y   2   3
4   B   y   5   3
2   A   z   3   4
5   B   z   6   3

这就是：

df =pd.concat([rep1.stack(),rep2.stack()],axis=1).reset_index().dropna()
df.columns =['GENE','TYPE','REP1','REP2']
df.sort(columns=['TYPE','GENE'], inplace=True)

将堆叠的数据帧连接在轴=1上。重置索引将返回gene和type列

dropna

负责基因c产生的空基。添加正确的列名等

   GENE TYPE REP1 REP2
0   A   x   1   7
3   B   x   4   3
1   A   y   2   3
4   B   y   5   3
2   A   z   3   4
5   B   z   6   3

这就是：

df =pd.concat([rep1.stack(),rep2.stack()],axis=1).reset_index().dropna()
df.columns =['GENE','TYPE','REP1','REP2']
df.sort(columns=['TYPE','GENE'], inplace=True)

将堆叠的数据帧连接在轴=1上。重置索引将返回gene和type列

dropna

负责基因c产生的空基。添加正确的列名等

   GENE TYPE REP1 REP2
0   A   x   1   7
3   B   x   4   3
1   A   y   2   3
4   B   y   5   3
2   A   z   3   4
5   B   z   6   3

这就是：

df =pd.concat([rep1.stack(),rep2.stack()],axis=1).reset_index().dropna()
df.columns =['GENE','TYPE','REP1','REP2']
df.sort(columns=['TYPE','GENE'], inplace=True)

将堆叠的数据帧连接在轴=1上。重置索引将返回gene和type列

dropna

负责基因c产生的空基。添加正确的列名等

   GENE TYPE REP1 REP2
0   A   x   1   7
3   B   x   4   3
1   A   y   2   3
4   B   y   5   3
2   A   z   3   4
5   B   z   6   3