Python 需要转换带有多个样本的df的代码来运行方框图

Python 需要转换带有多个样本的df的代码来运行方框图,python,pandas,biopython,Python,Pandas,Biopython,我正在写一个脚本,用一些RNA序列数据绘制方框图 伪代码 1. Select a row based on gene name 2. make a column for each type of cell 3. make box plot 我有1号和3号 df2 = df[df[0].str.match("TCAP")] ???? import plotly.express as px fig = px.box(df,x="CellType",y = "Expression",title

我正在写一个脚本,用一些RNA序列数据绘制方框图

伪代码

1. Select a row based on gene name 
2. make a column for each type of cell 
3. make box plot 
我有1号和3号

df2 = df[df[0].str.match("TCAP")]
????
import plotly.express as px
fig = px.box(df,x="CellType",y = "Expression",title = "GENE")
fig.show()
代码需要转换下表

Gene    Celltype-1_#1  Celltype-1_#2  Celltype-1_#3  Celltype-2_#1  Celltype-2_#2  Celltype-2_#3

A          1                1              1              3              3            3
B          5                5              5              4              4            4
使用:df2=df[df[0].str.match(“TCAP”)]

然后我需要代码把它变成这个

Gene  CellType   Expression    

 A       1           1

 A       1           1

 A       1           1

 A       2           3

 A       2           3

 A       2           3    
您可以使用这种方法进行这种转换

# need to have an index to make stack work
df = df.set_index('Gene')

# stack returns a series here
df = df.stack().to_frame().reset_index()

# At this point we have:
#     Gene        level_1  0
#  0     A  Celltype-1_#1  1
#  1     A  Celltype-1_#2  1
#  2     A  Celltype-1_#3  1
#  3     A  Celltype-2_#1  3
#  4     A  Celltype-2_#2  3
#  5     A  Celltype-2_#3  3
#  6     B  Celltype-1_#1  5
#  7     B  Celltype-1_#2  5
#  8     B  Celltype-1_#3  5
#  9     B  Celltype-2_#1  4
#  10    B  Celltype-2_#2  4
#  11    B  Celltype-2_#3  4

df.columns = ['Gene', 'Celltype', 'Expression']

# optionally rename values in celltype column
df['Celltype'] = df['Celltype'].apply(lambda t: t[9:10])

# now you can select by Gene or any other columns and pass to Plotly:
print(df[df['Gene'] == 'A'])

#     Gene Celltype  Expression
#  0     A        1           1
#  1     A        1           1
#  2     A        1           1
#  3     A        2           3
#  4     A        2           3
#  5     A        2           3
请注意,通过预先堆叠整个数据帧,现在一次选择多个基因并将它们以Plotly方式传递到一起非常简单:

df_many = df[df['Gene'].isin(['A', 'B'])]
您可以使用这种方法进行这种转换

# need to have an index to make stack work
df = df.set_index('Gene')

# stack returns a series here
df = df.stack().to_frame().reset_index()

# At this point we have:
#     Gene        level_1  0
#  0     A  Celltype-1_#1  1
#  1     A  Celltype-1_#2  1
#  2     A  Celltype-1_#3  1
#  3     A  Celltype-2_#1  3
#  4     A  Celltype-2_#2  3
#  5     A  Celltype-2_#3  3
#  6     B  Celltype-1_#1  5
#  7     B  Celltype-1_#2  5
#  8     B  Celltype-1_#3  5
#  9     B  Celltype-2_#1  4
#  10    B  Celltype-2_#2  4
#  11    B  Celltype-2_#3  4

df.columns = ['Gene', 'Celltype', 'Expression']

# optionally rename values in celltype column
df['Celltype'] = df['Celltype'].apply(lambda t: t[9:10])

# now you can select by Gene or any other columns and pass to Plotly:
print(df[df['Gene'] == 'A'])

#     Gene Celltype  Expression
#  0     A        1           1
#  1     A        1           1
#  2     A        1           1
#  3     A        2           3
#  4     A        2           3
#  5     A        2           3
请注意,通过预先堆叠整个数据帧,现在一次选择多个基因并将它们以Plotly方式传递到一起非常简单:

df_many = df[df['Gene'].isin(['A', 'B'])]

工作很漂亮工作很漂亮