Python 需要转换带有多个样本的df的代码来运行方框图
我正在写一个脚本,用一些RNA序列数据绘制方框图 伪代码Python 需要转换带有多个样本的df的代码来运行方框图,python,pandas,biopython,Python,Pandas,Biopython,我正在写一个脚本,用一些RNA序列数据绘制方框图 伪代码 1. Select a row based on gene name 2. make a column for each type of cell 3. make box plot 我有1号和3号 df2 = df[df[0].str.match("TCAP")] ???? import plotly.express as px fig = px.box(df,x="CellType",y = "Expression",title
1. Select a row based on gene name
2. make a column for each type of cell
3. make box plot
我有1号和3号
df2 = df[df[0].str.match("TCAP")]
????
import plotly.express as px
fig = px.box(df,x="CellType",y = "Expression",title = "GENE")
fig.show()
代码需要转换下表
Gene Celltype-1_#1 Celltype-1_#2 Celltype-1_#3 Celltype-2_#1 Celltype-2_#2 Celltype-2_#3
A 1 1 1 3 3 3
B 5 5 5 4 4 4
使用:df2=df[df[0].str.match(“TCAP”)]
然后我需要代码把它变成这个
Gene CellType Expression
A 1 1
A 1 1
A 1 1
A 2 3
A 2 3
A 2 3
您可以使用这种方法进行这种转换
# need to have an index to make stack work
df = df.set_index('Gene')
# stack returns a series here
df = df.stack().to_frame().reset_index()
# At this point we have:
# Gene level_1 0
# 0 A Celltype-1_#1 1
# 1 A Celltype-1_#2 1
# 2 A Celltype-1_#3 1
# 3 A Celltype-2_#1 3
# 4 A Celltype-2_#2 3
# 5 A Celltype-2_#3 3
# 6 B Celltype-1_#1 5
# 7 B Celltype-1_#2 5
# 8 B Celltype-1_#3 5
# 9 B Celltype-2_#1 4
# 10 B Celltype-2_#2 4
# 11 B Celltype-2_#3 4
df.columns = ['Gene', 'Celltype', 'Expression']
# optionally rename values in celltype column
df['Celltype'] = df['Celltype'].apply(lambda t: t[9:10])
# now you can select by Gene or any other columns and pass to Plotly:
print(df[df['Gene'] == 'A'])
# Gene Celltype Expression
# 0 A 1 1
# 1 A 1 1
# 2 A 1 1
# 3 A 2 3
# 4 A 2 3
# 5 A 2 3
请注意,通过预先堆叠整个数据帧,现在一次选择多个基因并将它们以Plotly方式传递到一起非常简单:
df_many = df[df['Gene'].isin(['A', 'B'])]
您可以使用这种方法进行这种转换
# need to have an index to make stack work
df = df.set_index('Gene')
# stack returns a series here
df = df.stack().to_frame().reset_index()
# At this point we have:
# Gene level_1 0
# 0 A Celltype-1_#1 1
# 1 A Celltype-1_#2 1
# 2 A Celltype-1_#3 1
# 3 A Celltype-2_#1 3
# 4 A Celltype-2_#2 3
# 5 A Celltype-2_#3 3
# 6 B Celltype-1_#1 5
# 7 B Celltype-1_#2 5
# 8 B Celltype-1_#3 5
# 9 B Celltype-2_#1 4
# 10 B Celltype-2_#2 4
# 11 B Celltype-2_#3 4
df.columns = ['Gene', 'Celltype', 'Expression']
# optionally rename values in celltype column
df['Celltype'] = df['Celltype'].apply(lambda t: t[9:10])
# now you can select by Gene or any other columns and pass to Plotly:
print(df[df['Gene'] == 'A'])
# Gene Celltype Expression
# 0 A 1 1
# 1 A 1 1
# 2 A 1 1
# 3 A 2 3
# 4 A 2 3
# 5 A 2 3
请注意,通过预先堆叠整个数据帧,现在一次选择多个基因并将它们以Plotly方式传递到一起非常简单:
df_many = df[df['Gene'].isin(['A', 'B'])]
工作很漂亮工作很漂亮