Python 使用列拆分数据帧_Python_Pandas_Dataframe

Python 使用列拆分数据帧

python pandas dataframe

Python 使用列拆分数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个包含大量列的数据框架。我真正想要的是创建/拆分数据帧。例如：生成玩具数据： df = pd.DataFrame(np.arange(10),columns = ['x']) df['y'] = np.arange(30,40,1) df['1'] = np.random.rand(10) df['2'] = np.random.rand(10) df['3'] = np.random.rand(10) df['4'] = np.random.rand(10) df['5'] = np

我有一个包含大量列的数据框架。我真正想要的是创建/拆分数据帧。例如：

生成玩具数据：

df = pd.DataFrame(np.arange(10),columns = ['x'])
df['y'] = np.arange(30,40,1)
df['1'] = np.random.rand(10)
df['2'] = np.random.rand(10)
df['3'] = np.random.rand(10)
df['4'] = np.random.rand(10)
df['5'] = np.random.rand(10)

df=

我真正想要的是以如下所示的方式拆分数据帧：

df1=

    x   y   1
0   0   30  0.047787
1   1   31  0.083536
2   2   32  0.201041
3   3   33  0.105833
4   4   34  0.146551
5   5   35  0.098615
6   6   36  0.387889
7   7   37  0.321591
8   8   38  0.983665
9   9   39  0.649185

df2=

等等。因为我有大量的专栏，所以很难一个一个地做。有没有更简单的方法

先谢谢你

您可以将始终保持静态的x和y列设置为索引轴然后跨列执行

groupby

通过使用字典理解，循环遍历每个这样的组。此外，末尾的

reset_索引将确保生成平坦的DF

df.set_index(['x','y'], inplace=True)
dfs = {i:grp.reset_index() for i, grp in df.groupby(np.arange(len(df.columns)), axis=1)}

生成的结果字典的键将构成列名，可以查询这些列名，如下所示：
dfs[0]



依此类推。您可以将始终保持静态的x和y列设置为索引轴
然后跨列执行groupby

通过使用字典理解，循环遍历每个这样的组。此外，末尾的reset_索引将确保生成平坦的DF

df.set_index(['x','y'], inplace=True)
dfs = {i:grp.reset_index() for i, grp in df.groupby(np.arange(len(df.columns)), axis=1)}

生成的结果字典的键将构成列名，可以查询这些列名，如下所示：
dfs[0]



依此类推。
对我来说，似乎可以将索引设置为['x'，'y']
，然后只需按列名获取列：
>>> df2 = df.set_index(['x','y'])
>>> df2
             1         2         3         4
x y                                         
0 30  0.161017  0.280965  0.058429  0.750003
1 31  0.643460  0.258441  0.951750  0.774355
2 32  0.948439  0.573363  0.126369  0.577629
3 33  0.896542  0.722825  0.927644  0.470369
4 34  0.298559  0.009676  0.841103  0.899220
5 35  0.855292  0.849880  0.529132  0.929805
6 36  0.428680  0.486381  0.271048  0.219826
7 37  0.752370  0.698653  0.980554  0.894405
8 38  0.027857  0.085865  0.086936  0.403528
9 39  0.522483  0.646266  0.825819  0.574025

>>> df2['1']
x  y 
0  30    0.161017
1  31    0.643460
2  32    0.948439
3  33    0.896542
4  34    0.298559
5  35    0.855292
6  36    0.428680
7  37    0.752370
8  38    0.027857
9  39    0.522483

如果只需要循环遍历列，可以执行以下操作：
>>> for i in range(1,5):
...     print df[['x','y',str(i)]]
... 
   x   y         1
0  0  30  0.161017
1  1  31  0.643460
2  2  32  0.948439
3  3  33  0.896542
4  4  34  0.298559
5  5  35  0.855292
6  6  36  0.428680
7  7  37  0.752370
8  8  38  0.027857
9  9  39  0.522483
   x   y         2
0  0  30  0.280965
1  1  31  0.258441
2  2  32  0.573363
3  3  33  0.722825
4  4  34  0.009676
5  5  35  0.849880
6  6  36  0.486381
7  7  37  0.698653
8  8  38  0.085865
9  9  39  0.646266
   x   y         3
0  0  30  0.058429
1  1  31  0.951750
2  2  32  0.126369
3  3  33  0.927644
4  4  34  0.841103
5  5  35  0.529132
6  6  36  0.271048
7  7  37  0.980554
8  8  38  0.086936
9  9  39  0.825819
   x   y         4
0  0  30  0.750003
1  1  31  0.774355
2  2  32  0.577629
3  3  33  0.470369
4  4  34  0.899220
5  5  35  0.929805
6  6  36  0.219826
7  7  37  0.894405
8  8  38  0.403528
9  9  39  0.574025

对我来说，您可以将索引设置为['x'，'y']
，然后按列名获取列：
>>> df2 = df.set_index(['x','y'])
>>> df2
             1         2         3         4
x y                                         
0 30  0.161017  0.280965  0.058429  0.750003
1 31  0.643460  0.258441  0.951750  0.774355
2 32  0.948439  0.573363  0.126369  0.577629
3 33  0.896542  0.722825  0.927644  0.470369
4 34  0.298559  0.009676  0.841103  0.899220
5 35  0.855292  0.849880  0.529132  0.929805
6 36  0.428680  0.486381  0.271048  0.219826
7 37  0.752370  0.698653  0.980554  0.894405
8 38  0.027857  0.085865  0.086936  0.403528
9 39  0.522483  0.646266  0.825819  0.574025

>>> df2['1']
x  y 
0  30    0.161017
1  31    0.643460
2  32    0.948439
3  33    0.896542
4  34    0.298559
5  35    0.855292
6  36    0.428680
7  37    0.752370
8  38    0.027857
9  39    0.522483

如果只需要循环遍历列，可以执行以下操作：
>>> for i in range(1,5):
...     print df[['x','y',str(i)]]
... 
   x   y         1
0  0  30  0.161017
1  1  31  0.643460
2  2  32  0.948439
3  3  33  0.896542
4  4  34  0.298559
5  5  35  0.855292
6  6  36  0.428680
7  7  37  0.752370
8  8  38  0.027857
9  9  39  0.522483
   x   y         2
0  0  30  0.280965
1  1  31  0.258441
2  2  32  0.573363
3  3  33  0.722825
4  4  34  0.009676
5  5  35  0.849880
6  6  36  0.486381
7  7  37  0.698653
8  8  38  0.085865
9  9  39  0.646266
   x   y         3
0  0  30  0.058429
1  1  31  0.951750
2  2  32  0.126369
3  3  33  0.927644
4  4  34  0.841103
5  5  35  0.529132
6  6  36  0.271048
7  7  37  0.980554
8  8  38  0.086936
9  9  39  0.825819
   x   y         4
0  0  30  0.750003
1  1  31  0.774355
2  2  32  0.577629
3  3  33  0.470369
4  4  34  0.899220
5  5  35  0.929805
6  6  36  0.219826
7  7  37  0.894405
8  8  38  0.403528
9  9  39  0.574025

您可以使用列表理解自动生成数据帧：
df_cuts = [df[['x', 'y', col]] for col in df.columns if col not in ['x', 'y']]

我在命令行中验证了输出：
for i in range(len(df_cuts)):
    print 'df %r:' % i
    print df_cuts[i]
    print '\n'

结果是：
df 0:
   x   y         1
0  0  30  0.695465
1  1  31  0.425572
2  2  32  0.018986
3  3  33  0.165947
4  4  34  0.103120
5  5  35  0.069060
6  6  36  0.676640
7  7  37  0.492231
8  8  38  0.950436
9  9  39  0.156195


df 1:
   x   y         2
0  0  30  0.928538
1  1  31  0.019624
2  2  32  0.862811
3  3  33  0.289581
4  4  34  0.150975
5  5  35  0.835313
6  6  36  0.768760
7  7  37  0.396042
8  8  38  0.423745
9  9  39  0.268596


df 2:
   x   y         3
0  0  30  0.999175
1  1  31  0.004125
2  2  32  0.137457
3  3  33  0.042903
4  4  34  0.580698
5  5  35  0.663723
6  6  36  0.996608
7  7  37  0.960361
8  8  38  0.932486
9  9  39  0.758873


df 3:
   x   y         4
0  0  30  0.708976
1  1  31  0.547635
2  2  32  0.722322
3  3  33  0.912707
4  4  34  0.380471
5  5  35  0.607742
6  6  36  0.803980
7  7  37  0.569364
8  8  38  0.882297
9  9  39  0.954743


df 4:
   x   y         5
0  0  30  0.900532
1  1  31  0.247818
2  2  32  0.629371
3  3  33  0.502218
4  4  34  0.183292
5  5  35  0.875611
6  6  36  0.940828
7  7  37  0.200641
8  8  38  0.874052
9  9  39  0.525997

您可以使用列表理解自动生成数据帧：
df_cuts = [df[['x', 'y', col]] for col in df.columns if col not in ['x', 'y']]

我在命令行中验证了输出：
for i in range(len(df_cuts)):
    print 'df %r:' % i
    print df_cuts[i]
    print '\n'

结果是：
df 0:
   x   y         1
0  0  30  0.695465
1  1  31  0.425572
2  2  32  0.018986
3  3  33  0.165947
4  4  34  0.103120
5  5  35  0.069060
6  6  36  0.676640
7  7  37  0.492231
8  8  38  0.950436
9  9  39  0.156195


df 1:
   x   y         2
0  0  30  0.928538
1  1  31  0.019624
2  2  32  0.862811
3  3  33  0.289581
4  4  34  0.150975
5  5  35  0.835313
6  6  36  0.768760
7  7  37  0.396042
8  8  38  0.423745
9  9  39  0.268596


df 2:
   x   y         3
0  0  30  0.999175
1  1  31  0.004125
2  2  32  0.137457
3  3  33  0.042903
4  4  34  0.580698
5  5  35  0.663723
6  6  36  0.996608
7  7  37  0.960361
8  8  38  0.932486
9  9  39  0.758873


df 3:
   x   y         4
0  0  30  0.708976
1  1  31  0.547635
2  2  32  0.722322
3  3  33  0.912707
4  4  34  0.380471
5  5  35  0.607742
6  6  36  0.803980
7  7  37  0.569364
8  8  38  0.882297
9  9  39  0.954743


df 4:
   x   y         5
0  0  30  0.900532
1  1  31  0.247818
2  2  32  0.629371
3  3  33  0.502218
4  4  34  0.183292
5  5  35  0.875611
6  6  36  0.940828
7  7  37  0.200641
8  8  38  0.874052
9  9  39  0.525997

非常感谢Nickil Maveli。这就是我想要的：）非常感谢尼基尔·马韦利。这就是我想要的：）