Python 如何有效地将数据帧转换为多维numpy数组?
我正在尝试将大熊猫数据帧转换为具有特定结构的numpy数组。熊猫数据帧的形状是(56930255),这里是dfPython 如何有效地将数据帧转换为多维numpy数组?,python,arrays,pandas,numpy,Python,Arrays,Pandas,Numpy,我正在尝试将大熊猫数据帧转换为具有特定结构的numpy数组。熊猫数据帧的形状是(56930255),这里是df trial channel digit Column7 Column8 Column9 Column10 Column11 Column12 Column13 ... Column249 Column250 Column251 Column252 Column253 Column254 Column255 Column
trial channel digit Column7 Column8 Column9 Column10 Column11 Column12 Column13 ... Column249 Column250 Column251 Column252 Column253 Column254 Column255 Column256 Column257 Column258
0 0.0 AF3 0.0 4259.487179 4237.948717 4247.179487 4242.051282 4233.333333 4251.282051 4232.820512 ... 4275.897435 4288.717948 4287.692307 4273.333333 4287.179487 4260.000000 4271.282051 4286.666666 4255.384615 4257.948717
1 0.0 AF4 0.0 4103.076923 4100.512820 4102.564102 4087.692307 4074.358974 4095.897435 4093.846153 ... 4210.256410 4234.358974 4252.820512 4238.461538 4253.333333 4244.615384 4241.538461 4229.743589 4214.871794 4237.948717
2 0.0 T7 0.0 4245.128205 4218.461538 4242.051282 4245.128205 4233.333333 4257.435897 4241.025641 ... 4270.256410 4259.487179 4259.487179 4277.435897 4292.307692 4250.256410 4263.076923 4260.000000 4232.307692 4277.435897
3 0.0 T8 0.0 4208.717948 4188.717948 4204.102564 4198.461538 4179.487179 4203.589743 4194.871794 ... 4215.897435 4236.410256 4222.051282 4191.282051 4231.794871 4226.153846 4218.461538 4216.410256 4202.051282 4220.000000
4 0.0 PZ 0.0 4189.230769 4203.589743 4188.717948 4186.666666 4198.461538 4177.435897 4192.820512 ... 4220.512820 4224.102564 4217.435897 4237.948717 4205.641025 4214.358974 4212.307692 4185.128205 4199.487179 4196.923076
5 1.0 AF3 6.0 4273.846153 4265.641025 4270.256410 4284.615384 4265.128205 4279.487179 4268.717948 ... 4283.076923 4289.743589 4308.717948 4277.948717 4296.410256 4303.076923 4291.794871 4312.820512 4306.153846 4319.487179
6 1.0 AF4 6.0 4233.846153 4252.307692 4262.564102 4244.102564 4217.435897 4262.564102 4239.487179 ... 4315.897435 4318.461538 4320.000000 4302.564102 4338.461538 4339.487179 4330.256410 4358.461538 4331.794871 4331.794871
7 1.0 T7 6.0 4301.025641 4301.025641 4293.333333 4289.230769 4285.128205 4320.000000 4302.051282 ... 4287.692307 4294.871794 4291.794871 4255.384615 4278.461538 4276.923076 4262.564102 4290.256410 4267.179487 4262.564102
8 1.0 T8 6.0 4209.743589 4210.769230 4198.974358 4215.897435 4218.974358 4238.461538 4218.974358 ... 4214.358974 4215.384615 4215.897435 4194.871794 4231.794871 4210.256410 4174.358974 4221.025641 4223.076923 4217.948717
9 1.0 PZ 6.0 4208.717948 4216.410256 4233.846153 4215.384615 4238.461538 4235.897435 4236.410256 ... 4204.102564 4228.205128 4205.641025 4218.461538 4204.615384 4204.615384 4226.153846 4210.256410 4236.410256 4225.128205
我一直在尝试做的是将它转换成一个维度为[numberTrials+1,5252]的numpy数组。这就形成了一个数组,其中每个试验都有自己的5个数组,代表每个“通道”,每个通道包含252个值
这就是我所尝试的
numberTrials = int(df.max(axis=0)[0])
x_train = np.zeros([numberTrials+1,5,252])
i = 0
j = 0
k = 0
l = 0
i_limit = x_train[0][0].size
while k <= numberTrials:
while j < 5: #There are 5 channels per trial
while i < i_limit: #There are 252 values per channel
x_df = df.iloc[j+l,3:]
x_array = x_df.values #convert pandas df to array
x_train[k][j][i] = x_array[i]
i += 1
i = 0
j += 1
i = 0
j = 0
l += 5
k += 1
numberTrials=int(df.max(axis=0)[0])
x_序列=np.零([numberTrials+1,5252])
i=0
j=0
k=0
l=0
i_limit=x_列[0][0]。大小
而k是部分重写,消除了i
级迭代:
x_train = np.zeros([numberTrials+1,5,252])
# i_limit = x_train.shape(2) # instead of x_train[0][0].size
j = 0
k = 0
l = 0
while k <= numberTrials:
while j < 5: #There are 5 channels per trial
x_df = df.iloc[j+l,3:]
x_train[k,j,:] = x_df.values
j += 1
j = 0
l += 5
k += 1
对于j
:
for k in range(numberTrials):
for j in range(5):
x_df = df.iloc[j+l,3:]
x_train[k,j,:] = x_df.values
l += 5
并进一步推断(无需测试):
但由于试验行是5个连续的组,我认为我们可以简单地重塑整个2d值
数组:
x_df = df.iloc[:, 3:].values
x_train = x_df.reshape(-1,5, x_df.shape[2])
你可以这样做:df.set_index(['trial','channel'])。值
mazing,我真的很惊讶你把所有这些都减少到了两行代码。它工作得很好,我只需将x_df.shape[2]更改为x_df.shape[1]。感谢您展示您的流程,因为它帮助我更好地理解它。
for k in range(numberTrials):
x_df = df.iloc[l:l+5, 3:]
x_train[k,:,:] = x_df.values
l += 5
x_df = df.iloc[:, 3:].values
x_train = x_df.reshape(-1,5, x_df.shape[2])