Python 将2D Panda的数据帧列表转换为3D数据帧
我试图创建一个熊猫数据框,将标签值保存到二维数据框。这就是我到目前为止所做的: 我正在使用PD.Read Tysv读取CSV文件并将它们添加到列表中,为了这个问题,我们考虑下面的代码:Python 将2D Panda的数据帧列表转换为3D数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,我试图创建一个熊猫数据框,将标签值保存到二维数据框。这就是我到目前为止所做的: 我正在使用PD.Read Tysv读取CSV文件并将它们添加到列表中,为了这个问题,我们考虑下面的代码: import numpy as np import pandas as pd raw_sample = [] labels = [1,1,1,2,2,2] samples = np.random.randn(6, 5, 4) for contents in range(samples.shape[0]):
import numpy as np
import pandas as pd
raw_sample = []
labels = [1,1,1,2,2,2]
samples = np.random.randn(6, 5, 4)
for contents in range(samples.shape[0]):
raw_sample.append(pd.DataFrame(samples[contents]))
然后,我将raw_sample添加到df=d.DataFrameraw_sample。然后,我通过执行以下操作将标签添加到df:
df = df.set_index([df.index, labels])
df.index = df.index.set_names('index', level=0)
df.index = df.index.set_names('labels', level=1)
p1 = pd.Panel(samples, items=map(str, labels))
我试着打印这个,结果
0
index labels
0 1 0 1 2 3
0 0...
1 1 0 1 2 3
0 0...
2 1 0 1 2 3
0 1...
3 2 0 1 2 3
0 -0...
4 2 0 1 2 3
0 0...
5 2 0 1 2 3
0 -0...
我也试过打印df[0],我还是得到了同样的结果
我想知道它是否以
index labels 0
0 1 1 2 3 4 5 6 7
3 5 6 7 9 5 4
3 4 5 6 7 8 9
1 1 4 3 2 4 5 6 7
3 5 6 7 4 5 6
2 3 4 3 4 5 3
...
我知道DataFrame不能采用2D数组,另一件事是使用pd.Panel,为此我将raw_sample的所有内容转换为numpy数组,然后将raw_sample自身转换为numpy数组,并执行了以下操作:
df = df.set_index([df.index, labels])
df.index = df.index.set_names('index', level=0)
df.index = df.index.set_names('labels', level=1)
p1 = pd.Panel(samples, items=map(str, labels))
但是当我打印这个的时候
<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 5 (major_axis) x 4 (minor_axis)
Items axis: 1 to 2
Major_axis axis: 0 to 4
Minor_axis axis: 0 to 3
期望输出:
index labels samples
0 1 1 2 3 4 5 6 7
3 5 6 7 9 5 4
3 4 5 6 7 8 9
1 1 4 3 2 4 5 6 7
3 5 6 7 4 5 6
2 3 4 3 4 5 3
...
如果选择非唯一项,则获取另一个面板: 但如果有唯一的,则获取数据帧: 它与具有非唯一列的DataFrame中的相同:
samples = np.random.randn(6, 5)
df = pd.DataFrame(samples, columns=list('11122'))
print (df)
1 1 1 2 2
0 0.346338 -0.855797 -0.932463 -2.289259 0.634696
1 0.272794 -0.924357 -1.898270 -0.743083 -1.587480
2 -0.519975 -0.136836 0.530178 -0.730629 2.520821
3 0.137530 -1.232763 0.508548 -0.480384 -1.213064
4 -0.157787 -1.600004 -1.287620 0.384642 -0.568072
5 -0.649427 -0.659585 -0.813359 -1.487412 -0.044206
print (df['1'])
1 1 1
0 0.346338 -0.855797 -0.932463
1 0.272794 -0.924357 -1.898270
2 -0.519975 -0.136836 0.530178
3 0.137530 -1.232763 0.508548
4 -0.157787 -1.600004 -1.287620
5 -0.649427 -0.659585 -0.813359
编辑:
此外,对于从列表创建df,需要唯一标签无唯一上升错误,并使用参数键运行,用于面板调用:
但无法创建面板:
p1 = df.to_panel()
print (p1)
>ValueError: Can't convert non-uniquely indexed DataFrame to Panel
不确定你到底需要什么。“你能给我们你的输入和想要的输出吗?”Allen更新。谢谢。我不确定,但似乎您需要唯一的标签,所以将labels=[1,1,1,2,2]更改为labels=list'abcdef',然后可以通过打印p1['a']@jezrael进行选择,但标签不能是唯一的。@akshay-是的,这是可能的。但如果测试打印p1 vs打印p1['1'],则以两种方式获取面板,只过滤第二个-尺寸:6个项目x 5个长轴x 4个短轴vs尺寸:3个项目x 5个长轴x 4个短轴请检查编辑以从列表创建数据框。问题是,标签不能唯一,每个标签都映射到一个样本。它们就像机器学习的样本。熊猫中的复制品是受支持的,但sume函数不能像reindex、concat那样工作。
print (p1.to_frame())
a b c d e f
major minor
0 0 1.331587 -1.977728 0.660232 -0.232182 1.985085 0.117476
1 0.715279 -1.743372 -0.350872 -0.501729 1.744814 -1.907457
2 -1.545400 0.266070 -0.939433 1.128785 -1.856185 -0.922909
3 -0.008384 2.384967 -0.489337 -0.697810 -0.222774 0.469751
1 0 0.621336 1.123691 -0.804591 -0.081122 -0.065848 -0.144367
1 -0.720086 1.672622 -0.212698 -0.529296 -2.131712 -0.400138
2 0.265512 0.099149 -0.339140 1.046183 -0.048831 -0.295984
3 0.108549 1.397996 0.312170 -1.418556 0.393341 0.848209
2 0 0.004291 -0.271248 0.565153 -0.362499 0.217265 0.706830
1 -0.174600 0.613204 -0.147420 -0.121906 -1.994394 -0.787269
2 0.433026 -0.267317 -0.025905 0.319356 1.107708 0.292941
3 1.203037 -0.549309 0.289094 0.460903 0.244544 -0.470807
3 0 -0.965066 0.132708 -0.539879 -0.215790 -0.061912 2.404326
1 1.028274 -0.476142 0.708160 0.989072 -0.753893 -0.739357
2 0.228630 1.308473 0.842225 0.314754 0.711959 -0.312829
3 0.445138 0.195013 0.203581 2.467651 0.918269 -0.348882
4 0 -1.136602 0.400210 2.394704 -1.508321 -0.482093 -0.439026
1 0.135137 -0.337632 0.917459 0.620601 0.089588 0.141104
2 1.484537 1.256472 -0.112272 -1.045133 0.826999 0.273049
3 -1.079805 -0.731970 -0.362180 -0.798009 -1.954512 -1.618571
samples = np.random.randn(6, 5)
df = pd.DataFrame(samples, columns=list('11122'))
print (df)
1 1 1 2 2
0 0.346338 -0.855797 -0.932463 -2.289259 0.634696
1 0.272794 -0.924357 -1.898270 -0.743083 -1.587480
2 -0.519975 -0.136836 0.530178 -0.730629 2.520821
3 0.137530 -1.232763 0.508548 -0.480384 -1.213064
4 -0.157787 -1.600004 -1.287620 0.384642 -0.568072
5 -0.649427 -0.659585 -0.813359 -1.487412 -0.044206
print (df['1'])
1 1 1
0 0.346338 -0.855797 -0.932463
1 0.272794 -0.924357 -1.898270
2 -0.519975 -0.136836 0.530178
3 0.137530 -1.232763 0.508548
4 -0.157787 -1.600004 -1.287620
5 -0.649427 -0.659585 -0.813359
np.random.seed(100)
raw_sample = []
labels = list('abcdef')
samples = np.random.randn(6, 5, 4)
for contents in range(samples.shape[0]):
raw_sample.append(pd.DataFrame(samples[contents]))
df = pd.concat(raw_sample, keys=labels)
print (df)
0 1 2 3
a 0 -1.749765 0.342680 1.153036 -0.252436
1 0.981321 0.514219 0.221180 -1.070043
2 -0.189496 0.255001 -0.458027 0.435163
3 -0.583595 0.816847 0.672721 -0.104411
4 -0.531280 1.029733 -0.438136 -1.118318
b 0 1.618982 1.541605 -0.251879 -0.842436
1 0.184519 0.937082 0.731000 1.361556
2 -0.326238 0.055676 0.222400 -1.443217
3 -0.756352 0.816454 0.750445 -0.455947
4 1.189622 -1.690617 -1.356399 -1.232435
c 0 -0.544439 -0.668172 0.007315 -0.612939
1 1.299748 -1.733096 -0.983310 0.357508
2 -1.613579 1.470714 -1.188018 -0.549746
3 -0.940046 -0.827932 0.108863 0.507810
4 -0.862227 1.249470 -0.079611 -0.889731
d 0 -0.881798 0.018639 0.237845 0.013549
1 -1.635529 -1.044210 0.613039 0.736205
2 1.026921 -1.432191 -1.841188 0.366093
3 -0.331777 -0.689218 2.034608 -0.550714
4 0.750453 -1.306992 0.580573 -1.104523
e 0 0.690121 0.686890 -1.566688 0.904974
1 0.778822 0.428233 0.108872 0.028284
2 -0.578826 -1.199451 -1.705952 0.369164
3 1.876573 -0.376903 1.831936 0.003017
4 -0.076023 0.003958 -0.185014 -2.487152
f 0 -1.704651 -1.136261 -2.973315 0.033317
1 -0.248889 -0.450176 0.132428 0.022214
2 0.317368 -0.752414 -1.296392 0.095139
3 -0.423715 -1.185984 -0.365462 -1.271023
4 1.586171 0.693391 -1.958081 -0.134801
p1 = df.to_panel()
print (p1)
<class 'pandas.core.panel.Panel'>
Dimensions: 4 (items) x 6 (major_axis) x 5 (minor_axis)
Items axis: 0 to 3
Major_axis axis: a to f
Minor_axis axis: 0 to 4
np.random.seed(100)
raw_sample = []
labels = [1,1,1,2,2,2]
mux = pd.MultiIndex.from_arrays([labels, range(len(labels))])
samples = np.random.randn(6, 5, 4)
for contents in range(samples.shape[0]):
raw_sample.append(pd.DataFrame(samples[contents]))
df = pd.concat(raw_sample, keys=mux)
df = df.reset_index(level=1, drop=True)
print (df)
0 1 2 3
1 0 -1.749765 0.342680 1.153036 -0.252436
1 0.981321 0.514219 0.221180 -1.070043
2 -0.189496 0.255001 -0.458027 0.435163
3 -0.583595 0.816847 0.672721 -0.104411
4 -0.531280 1.029733 -0.438136 -1.118318
0 1.618982 1.541605 -0.251879 -0.842436
1 0.184519 0.937082 0.731000 1.361556
2 -0.326238 0.055676 0.222400 -1.443217
3 -0.756352 0.816454 0.750445 -0.455947
4 1.189622 -1.690617 -1.356399 -1.232435
0 -0.544439 -0.668172 0.007315 -0.612939
1 1.299748 -1.733096 -0.983310 0.357508
2 -1.613579 1.470714 -1.188018 -0.549746
3 -0.940046 -0.827932 0.108863 0.507810
4 -0.862227 1.249470 -0.079611 -0.889731
2 0 -0.881798 0.018639 0.237845 0.013549
1 -1.635529 -1.044210 0.613039 0.736205
2 1.026921 -1.432191 -1.841188 0.366093
3 -0.331777 -0.689218 2.034608 -0.550714
4 0.750453 -1.306992 0.580573 -1.104523
0 0.690121 0.686890 -1.566688 0.904974
1 0.778822 0.428233 0.108872 0.028284
2 -0.578826 -1.199451 -1.705952 0.369164
3 1.876573 -0.376903 1.831936 0.003017
4 -0.076023 0.003958 -0.185014 -2.487152
0 -1.704651 -1.136261 -2.973315 0.033317
1 -0.248889 -0.450176 0.132428 0.022214
2 0.317368 -0.752414 -1.296392 0.095139
3 -0.423715 -1.185984 -0.365462 -1.271023
4 1.586171 0.693391 -1.958081 -0.134801
p1 = df.to_panel()
print (p1)
>ValueError: Can't convert non-uniquely indexed DataFrame to Panel