Python 基于值在数据帧中有条件地划分列_Python_Pandas_Dataframe

Python 基于值在数据帧中有条件地划分列

python pandas dataframe

Python 基于值在数据帧中有条件地划分列,python,pandas,dataframe,Python,Pandas,Dataframe,所以我有一个数据帧，看起来像这样…我们称之为df1 Disease Gene1 Gene2 Gene3 Gene4 0 D1 1 1 26 1 1 D2 1 1 1 1 2 D3 1 18 1 17 3 D4 25 1 1 1 4 D5 1 1 1

所以我有一个数据帧，看起来像这样…我们称之为df1

  Disease  Gene1  Gene2  Gene3  Gene4
0      D1      1      1     26      1
1      D2      1      1      1      1
2      D3      1     18      1     17
3      D4     25      1      1      1
4      D5      1      1      1      1
5      D6      1     33      1     12
6      D7      1      1      1      1
7      D8      5      1      1      1

另一个看起来像…df2：

    Disease Counts
0   D1  117
1   D2  224
2   D3  411
3   D4  180
4   D5  96
5   D6  24
6   D7  331
7   D8  512

我需要根据疾病列匹配情况将df1中的行除以df2中的计数。

如果将两个dfs的索引设置为“疾病”，则可以调用：

通过将索引设置为“疾病”，dfs将与索引值对齐

然后，您可以调用以还原列：

In [132]:
(df.set_index('Disease').div(df1.set_index('Disease')['Counts'], axis=0)).reset_index()

Out[132]:
  Disease     Gene1     Gene2     Gene3     Gene4
0      D1  0.008547  0.008547  0.222222  0.008547
1      D2  0.004464  0.004464  0.004464  0.004464
2      D3  0.002433  0.043796  0.002433  0.041363
3      D4  0.138889  0.005556  0.005556  0.005556
4      D5  0.010417  0.010417  0.010417  0.010417
5      D6  0.041667  1.375000  0.041667  0.500000
6      D7  0.003021  0.003021  0.003021  0.003021
7      D8  0.009766  0.001953  0.001953  0.001953

如果将两个dfs的索引都设置为“疾病”，则可以调用：

通过将索引设置为“疾病”，dfs将与索引值对齐

然后，您可以调用以还原列：

In [132]:
(df.set_index('Disease').div(df1.set_index('Disease')['Counts'], axis=0)).reset_index()

Out[132]:
  Disease     Gene1     Gene2     Gene3     Gene4
0      D1  0.008547  0.008547  0.222222  0.008547
1      D2  0.004464  0.004464  0.004464  0.004464
2      D3  0.002433  0.043796  0.002433  0.041363
3      D4  0.138889  0.005556  0.005556  0.005556
4      D5  0.010417  0.010417  0.010417  0.010417
5      D6  0.041667  1.375000  0.041667  0.500000
6      D7  0.003021  0.003021  0.003021  0.003021
7      D8  0.009766  0.001953  0.001953  0.001953

我不确定我做了什么，但我得到了一个关键错误：KeyError:“Disease”…我以前使用df.set_index（'Disease，inplace=True）设置列索引。我不知道这是否会引起任何问题。如果“Disease”列已经是索引，您不需要重新设置它。不确定我做了什么，但我得到了一个关键错误：keyrerror:“Disease”…我以前使用df.set_index（'Disease，inplace=True）设置列索引。我不知道这是否会引起任何问题。如果“Disease”列已经是索引，您不需要重新设置它。不确定我做了什么，但我得到了一个关键错误：keyrerror:“Disease”…我以前使用df.set_index（'Disease，inplace=True）设置列索引。我不知道这是否会导致任何问题。如果“疾病”列已经是索引，则无需再次设置