Python 为什么pandas.DataFrame.mean（）可以工作，但pandas.DataFrame.std（）不能覆盖相同的数据_Python_Pandas_Numpy

Python 为什么pandas.DataFrame.mean（）可以工作，但pandas.DataFrame.std（）不能覆盖相同的数据

python pandas numpy

Python 为什么pandas.DataFrame.mean（）可以工作，但pandas.DataFrame.std（）不能覆盖相同的数据,python,pandas,numpy,Python,Pandas,Numpy,我试图弄明白为什么pandas.DataFrame.mean（）函数可以在一系列数据上工作，但是pandas.DataFrame.std（）不能在相同的数据上工作。下面是一个最起码的例子 x = np.array([1,2,3]) y = np.array([4,5,6]) df = pd.DataFrame({"numpy": [x,y]}) df["numpy"].mean() #works as expected Out[231]: array([ 2.5, 3.5, 4.5])

我试图弄明白为什么pandas.DataFrame.mean（）函数可以在一系列数据上工作，但是pandas.DataFrame.std（）不能在相同的数据上工作。下面是一个最起码的例子

x = np.array([1,2,3])
y = np.array([4,5,6])
df = pd.DataFrame({"numpy": [x,y]})

df["numpy"].mean() #works as expected
Out[231]: array([ 2.5,  3.5,  4.5])

df["numpy"].std() #does not work as expected
Out[231]: TypeError: setting an array element with a sequence.

但是，如果我通过

df["numpy"].values.mean() #works as expected
Out[231]: array([ 2.5,  3.5,  4.5])

df["numpy"].values.std() #works as expected
Out[233]: array([ 1.5,  1.5,  1.5])

调试信息：

df["numpy"].dtype
Out[235]: dtype('O')

df["numpy"][0].dtype
Out[236]: dtype('int32')

df["numpy"].describe()
Out[237]: 
count             2
unique            2
top       [1, 2, 3]
freq              1
Name: numpy, dtype: object

df["numpy"]
Out[238]: 
0    [1, 2, 3]
1    [4, 5, 6]
Name: numpy, dtype: object

假设您有以下原始DF（在单元格中包含相同形状的numpy数组）：

将其转换为以下格式：

In [321]: d = pd.DataFrame(df['numpy'].values.tolist(), index=df['file'])

In [322]: d
Out[322]:
      0  1  2
file
x     1  2  3
y     4  5  6

现在您可以自由使用所有Pandas/Numpy/Scipy电源：

In [323]: d.sum(axis=1)
Out[323]:
file
x     6
y    15
dtype: int64

In [324]: d.sum(axis=0)
Out[324]:
0    5
1    7
2    9
dtype: int64

In [325]: d.mean(axis=0)
Out[325]:
0    2.5
1    3.5
2    4.5
dtype: float64

In [327]: d.std(axis=0)
Out[327]:
0    2.12132
1    2.12132
2    2.12132
dtype: float64

我想我-你会有很多类似的问题，关于单元格中的非标量值你为什么要使用数据帧作为Dict？您应该将这些值加载到数据帧本身。我同意@MaxU。在

pandas

内置函数的上下文中尝试回答这个问题似乎有点毫无意义，因为这根本不是库应该如何使用的。创建一个数据框只是为了将值拖回数组/列表。。。摆脱中间商，让熊猫远离它——如果你选择这样使用它，那只会是一个障碍。为什么它能与mean一起工作？因为一个实现细节。偶然这是不受支持的。通过将这些阵列存储在数据帧单元中，您的工作变得更加困难。@KevinVasko，您的所有子阵列的形状都相同吗？谢谢，我将尝试一下，而不是目前的做法。

In [323]: d.sum(axis=1)
Out[323]:
file
x     6
y    15
dtype: int64

In [324]: d.sum(axis=0)
Out[324]:
0    5
1    7
2    9
dtype: int64

In [325]: d.mean(axis=0)
Out[325]:
0    2.5
1    3.5
2    4.5
dtype: float64

In [327]: d.std(axis=0)
Out[327]:
0    2.12132
1    2.12132
2    2.12132
dtype: float64