Python 如何分割数据帧并将其重新组装成新的数据帧
我得到的数据帧如下所示:Python 如何分割数据帧并将其重新组装成新的数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我得到的数据帧如下所示: A YEAR2000 B YEAR2001 C YEAR2002 a 1 b 3 a 7 b 3 c 5 e 6 c 6 d 2 f 3 e 1 g 0 每两列切片一次,然后重新组织以形成新的数据帧,如下所示
A YEAR2000 B YEAR2001 C YEAR2002
a 1 b 3 a 7
b 3 c 5 e 6
c 6 d 2 f 3
e 1 g 0
每两列切片一次,然后重新组织以形成新的数据帧,如下所示:
type YEAR2000 YEAR2001 YEAR2002
a 1 7
b 3 3
c 6 5
d 2
e 1 6
f 3
g 0
dataframe_list = []
for i in range(0,origin_df.columns.size):
if i % 2 == 0:
dataframe_list.append(origin_df.iloc[:,[i,i + 1]])
new_dataframe = pd.DataFrame()
new_dataframe = pd.concat(dataframe_list,axis = 0)
new_dataframe
it = iter(df)
cols_list = list(map(list, zip(it, it)))
Out[1854]: [['A', 'YEAR2000'], ['B', 'YEAR2001'], ['C', 'YEAR2002']]
我试过了
pd.concat()
,但是发生了一些错误!谢谢。我想您可以使用groupby
和axis=1
然后使用concat
l=[y.set_index(y.columns[0]).dropna() for x , y in df.groupby(np.arange(df.shape[1])//2,axis=1)]
pd.concat(l,axis=1,sort=True)
Out[858]:
YEAR2000 YEAR2001 YEAR2002
a 1.0 NaN 7.0
b 3.0 3.0 NaN
c 6.0 5.0 NaN
d NaN 2.0 NaN
e NaN 1.0 6.0
f NaN NaN 3.0
g NaN NaN 0.0
dfs = [df[cols].set_index(cols[0]) for cols in cols_list]
pd.concat(dfs, axis=1).dropna(axis=0, how='all')
Out[1868]:
YEAR2000 YEAR2001 YEAR2002
a 1.0 NaN 7.0
b 3.0 3.0 NaN
c 6.0 5.0 NaN
d NaN 2.0 NaN
e NaN 1.0 6.0
f NaN NaN 3.0
g NaN NaN 0.0
使用merge两次就可以实现它
df1=pd.DataFrame([[a',1],[b',3],[c',6]],列=['letter',number'])
df2=pd.DataFrame([['b',3],'c',5],'d',2],'e',1]],列=['letter','number'])
df3=pd.DataFrame([[a',7],[e',6],[f',3],[g',0]],列=['letter',number'])
pd.merge(pd.merge(df1,df2,how='outer',on='letter'),df3,how='outer',on='letter'))
要获得更干净的外观:
df1.merge(df2,how='outer',on='letter')。merge(df3,how='outer',on='letter'))
如果您有多个数据帧,请将它们放入一个列表中,并使用reduce进行理解
从functools导入reduce
dfs=[df1、df2、df3]
reduce(lambda left,right:left.merge(right,how='outer',on='letter'),dfs)
我的代码如下:
type YEAR2000 YEAR2001 YEAR2002
a 1 7
b 3 3
c 6 5
d 2
e 1 6
f 3
g 0
dataframe_list = []
for i in range(0,origin_df.columns.size):
if i % 2 == 0:
dataframe_list.append(origin_df.iloc[:,[i,i + 1]])
new_dataframe = pd.DataFrame()
new_dataframe = pd.concat(dataframe_list,axis = 0)
new_dataframe
it = iter(df)
cols_list = list(map(list, zip(it, it)))
Out[1854]: [['A', 'YEAR2000'], ['B', 'YEAR2001'], ['C', 'YEAR2002']]
以防您有超过6列:
num_cols = len(df.columns)
pd.concat([df.iloc[:,i:i+2].dropna()
.set_index(df.columns[i])
for i in range(0,len(df.columns),2)],
axis=1,
sort=True
)
输出:
YEAR2000 YEAR2001 YEAR2002
a 1.0 NaN 7.0
b 3.0 3.0 NaN
c 6.0 5.0 NaN
d NaN 2.0 NaN
e NaN 1.0 6.0
f NaN NaN 3.0
g NaN NaN 0.0
我认为简单的解决方案是使用
pd.concat
。只需将索引设置为列A
,B
,C
。。。在pd.concat
之前的这些子数据帧中。对于具有大量未知名称列的df
,可以通过使用iter
和zip
轻松实现,如下所示:
type YEAR2000 YEAR2001 YEAR2002
a 1 7
b 3 3
c 6 5
d 2
e 1 6
f 3
g 0
dataframe_list = []
for i in range(0,origin_df.columns.size):
if i % 2 == 0:
dataframe_list.append(origin_df.iloc[:,[i,i + 1]])
new_dataframe = pd.DataFrame()
new_dataframe = pd.concat(dataframe_list,axis = 0)
new_dataframe
it = iter(df)
cols_list = list(map(list, zip(it, it)))
Out[1854]: [['A', 'YEAR2000'], ['B', 'YEAR2001'], ['C', 'YEAR2002']]
接下来,使用listcomp从cols\u list
和pd.concat
l=[y.set_index(y.columns[0]).dropna() for x , y in df.groupby(np.arange(df.shape[1])//2,axis=1)]
pd.concat(l,axis=1,sort=True)
Out[858]:
YEAR2000 YEAR2001 YEAR2002
a 1.0 NaN 7.0
b 3.0 3.0 NaN
c 6.0 5.0 NaN
d NaN 2.0 NaN
e NaN 1.0 6.0
f NaN NaN 3.0
g NaN NaN 0.0
dfs = [df[cols].set_index(cols[0]) for cols in cols_list]
pd.concat(dfs, axis=1).dropna(axis=0, how='all')
Out[1868]:
YEAR2000 YEAR2001 YEAR2002
a 1.0 NaN 7.0
b 3.0 3.0 NaN
c 6.0 5.0 NaN
d NaN 2.0 NaN
e NaN 1.0 6.0
f NaN NaN 3.0
g NaN NaN 0.0
您应该提供您的代码,以显示您迄今为止所做的尝试。在
concat
之前设置索引,然后它将与axis=1
对齐谢谢您的帮助!但是,如果数据帧的数量大于或等于10100,我该怎么办?@butting You's welcome:)如果您接受答案,您可以单击左侧的接受按钮(勾号符号)✓).谢谢你的帮助!我尝试了一下,它运行良好如果你想使用pd.concat
的方法,请查看我的答案。谢谢你的帮助!我是熊猫队的新手,所以我一直在学习!