Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/user-interface/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用形状的因子级别将pandas.DataFrame转换为numpy张量_Python_Pandas_Numpy_Tensor_Numpy Ndarray - Fatal编程技术网

Python 使用形状的因子级别将pandas.DataFrame转换为numpy张量

Python 使用形状的因子级别将pandas.DataFrame转换为numpy张量,python,pandas,numpy,tensor,numpy-ndarray,Python,Pandas,Numpy,Tensor,Numpy Ndarray,我有完全析因实验的数据。例如,对于每个N样本,我有J测量类型和K测量位点。例如,我以长格式接收这些数据 import numpy as np import pandas as pd import itertools from numpy.random import normal as rnorm # [[N], [J], [K]] levels = [[1,2,3,4], ['start', 'stop'], ['gene1', 'gene2', 'gene3']] # fully cros

我有完全析因实验的数据。例如,对于每个
N
样本,我有
J
测量类型和
K
测量位点。例如,我以长格式接收这些数据

import numpy as np
import pandas as pd
import itertools
from numpy.random import normal as rnorm

# [[N], [J], [K]]
levels = [[1,2,3,4], ['start', 'stop'], ['gene1', 'gene2', 'gene3']]

# fully crossed
exp_design = list(itertools.product(*levels))

df = pd.DataFrame(exp_design, columns=["sample", "mode", "gene"])

# some fake data
df['x'] = rnorm(size=len(exp_design))
这将产生24个观察结果(
x
),其中三个因素各有一列

> df.head()
    sample  mode    gene    x
0   1       start   gene1   -1.229370
1   1       start   gene2   1.129773
2   1       start   gene3   -1.155202
3   1       stop    gene1   -0.757551
4   1       stop    gene2   -0.166129
我想把这些观测值转换成相应的
(N,J,K)
形张量(numpy数组)。我在考虑使用多索引旋转到宽格式,然后提取值将生成正确的张量,但它只是作为列向量:

> df.pivot_table(values='x', index=['sample', 'mode', 'gene']).values
array([[-1.22936989],
       [ 1.12977346],
       [-1.15520216],
       ...,
       [-0.1031641 ],
       [ 1.1296491 ],
       [ 1.31113584]])
有没有一种快速的方法可以从长格式的pandas.DataFrame中获取张量格式的数据?

试试

df.agg('nunique')

Out[69]: 
sample     4
mode       2
gene       3
x         24
dtype: int64
s=df.agg('nunique')
df.x.values.reshape(s['sample'],s['mode'],s['gene'])
Out[71]: 
array([[[-2.78133759e-01, -1.42234420e+00,  5.42439121e-01],
        [ 2.15359867e+00,  6.55837886e-01, -1.01293568e+00]],
       [[ 7.92306679e-01, -1.62539763e-01, -6.13120335e-01],
        [-2.91567999e-01, -4.01257702e-01,  7.96422763e-01]],
       [[ 1.05088264e-01, -7.23400925e-02,  2.78515041e-01],
        [ 2.63088568e-01,  1.47477886e+00, -2.10735619e+00]],
       [[-1.71756374e+00,  6.12224005e-04, -3.11562798e-02],
        [ 5.26028807e-01, -1.18502045e+00,  1.88633760e+00]]])

我认为在这里需要注意的是,这假设数据帧首先排序为,
df.sort_值(按=['sample','mode','gene'])
@merv是的,您是对的