Python 数据帧中特定列的快速拆分行
我有以下数据框:Python 数据帧中特定列的快速拆分行,python,pandas,Python,Pandas,我有以下数据框: import pandas as pd df = pd.DataFrame({'Probes':["1415693_at","1415693_at"], 'Genes':["Canx","LOC101056688 /// Wars "], 'cv_filter':[ 0.134,0.290], 'Organ' :["LN","LV"]} ) df = d
import pandas as pd
df = pd.DataFrame({'Probes':["1415693_at","1415693_at"],
'Genes':["Canx","LOC101056688 /// Wars "],
'cv_filter':[ 0.134,0.290],
'Organ' :["LN","LV"]} )
df = df[["Probes","Genes","cv_filter","Organ"]]
看起来是这样的:
In [16]: df
Out[16]:
Probes Genes cv_filter Organ
0 1415693_at Canx 0.134 LN
1 1415693_at LOC101056688 /// Wars 0.290 LV
我想做的是根据它输入的Genes列拆分行
由“//”分隔
我想得到的结果是
Probes Genes cv_filter Organ
0 1415693_at Canx 0.134 LN
1 1415693_at LOC101056688 0.290 LV
2 1415693_at Wars 0.290 LV
我总共要检查约15万行。有没有快速的处理方法?您可以尝试第一列基因
,创建新的系列
,并将其转换为原始的df
:
import pandas as pd
df = pd.DataFrame({'Probes':["1415693_at","1415693_at"],
'Genes':["Canx","LOC101056688 /// Wars "],
'cv_filter':[ 0.134,0.290],
'Organ' :["LN","LV"]} )
df = df[["Probes","Genes","cv_filter","Organ"]]
print df
Probes Genes cv_filter Organ
0 1415693_at Canx 0.134 LN
1 1415693_at LOC101056688 /// Wars 0.290 LV
s = pd.DataFrame([ x.split('///') for x in df['Genes'].tolist() ], index=df.index).stack()
#or you can use approach from comment
#s = df['Genes'].str.split('///', expand=True).stack()
s.index = s.index.droplevel(-1)
s.name = 'Genes'
print s
0 Canx
1 LOC101056688
1 Wars
Name: Genes, dtype: object
#remove original columns, because error:
#ValueError: columns overlap but no suffix specified: Index([u'Genes'], dtype='object')
df = df.drop('Genes', axis=1)
df = df.join(s).reset_index(drop=True)
print df[["Probes","Genes","cv_filter","Organ"]]
Probes Genes cv_filter Organ
0 1415693_at Canx 0.134 LN
1 1415693_at LOC101056688 0.290 LV
2 1415693_at Wars 0.290 LV
为什么不
df['Genes'].str.split('///',expand=True).stack()
而不是df['Genes'].str.split('//')。apply(pd.Series,1).stack()
。大约是2倍faster@AntonProtopopov-谢谢。我把它作为替代解决方案添加到我的答案中(只是比DataFrame
constructor慢一点)。你是对的,所以index
被添加到DataFrame
构造函数中。