Python 分组数据帧:如何对其应用scipy.stats.sem?
我知道我可以通过执行以下操作来应用numpy方法:Python 分组数据帧:如何对其应用scipy.stats.sem?,python,numpy,statistics,scipy,pandas,Python,Numpy,Statistics,Scipy,Pandas,我知道我可以通过执行以下操作来应用numpy方法: dataList是DataFrames(相同列/行)的列表 等等。但是,如果我想计算平均值的标准误差(sem),该怎么办 我试过: testDF.aggregate(scipy.stats.sem) 但它给出了一个令人困惑的错误。有人知道怎么做吗?scipy.stats方法有哪些不同之处 下面是一些为我重现错误的代码: from scipy import stats as st import pandas import numpy as np
dataList
是DataFrame
s(相同列/行)的列表
等等。但是,如果我想计算平均值的标准误差(sem),该怎么办
我试过:
testDF.aggregate(scipy.stats.sem)
但它给出了一个令人困惑的错误。有人知道怎么做吗?scipy.stats方法有哪些不同之处
下面是一些为我重现错误的代码:
from scipy import stats as st
import pandas
import numpy as np
df_list = []
for ii in range(30):
df_list.append(pandas.DataFrame(np.random.rand(600, 10),
columns = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']))
testDF = (pandas.concat(df_list, axis=1, keys=range(len(df_list)))
.swaplevel(0, 1, axis=1)
.sortlevel(axis=1)
.groupby(level=0, axis=1))
testDF.aggregate(st.sem)
以下是错误消息:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-1-184cee8fb2ce> in <module>()
12 .groupby(level=0, axis=1))
13
---> 14 testDF.aggregate(st.sem)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/groupby.py in aggregate(self, arg, *args, **kwargs)
1177 return self._python_agg_general(arg, *args, **kwargs)
1178 else:
-> 1179 result = self._aggregate_generic(arg, *args, **kwargs)
1180
1181 if not self.as_index:
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/groupby.py in _aggregate_generic(self, func, *args, **kwargs)
1248 else:
1249 result = DataFrame(result, index=obj.index,
-> 1250 columns=result_index)
1251 else:
1252 result = DataFrame(result)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
300 mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy)
301 elif isinstance(data, dict):
--> 302 mgr = self._init_dict(data, index, columns, dtype=dtype)
303 elif isinstance(data, ma.MaskedArray):
304 mask = ma.getmaskarray(data)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
389
390 # consolidate for now
--> 391 mgr = BlockManager(blocks, axes)
392 return mgr.consolidate()
393
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check)
329
330 if do_integrity_check:
--> 331 self._verify_integrity()
332
333 def __nonzero__(self):
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/internals.py in _verify_integrity(self)
404 mgr_shape = self.shape
405 for block in self.blocks:
--> 406 assert(block.values.shape[1:] == mgr_shape[1:])
407 tot_items = sum(len(x.items) for x in self.blocks)
408 assert(len(self.items) == tot_items)
AssertionError:
---------------------------------------------------------------------------
AssertionError回溯(上次最近的调用)
在()
12.分组依据(级别=0,轴=1))
13
--->14骨料试验(标准扫描电镜)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/groupby.py(self、arg、*args、**kwargs)
1177返回self._python_agg_general(arg,*args,**kwargs)
1178其他:
->1179结果=self.\u聚合\u通用(arg,*args,**kwargs)
1180
1181如果不是self.as_索引:
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/groupby.py in_aggregate_generic(self、func、*args、**kwargs)
1248其他:
1249结果=数据帧(结果,索引=对象索引,
->1250列=结果(索引)
1251其他:
1252结果=数据帧(结果)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/framework.py in_u___;init___;(self、数据、索引、列、数据类型、副本)
300 mgr=self.\u init\u mgr(数据、索引、列、数据类型=数据类型、副本=副本)
301 elif isinstance(数据、指令):
-->302 mgr=self.\u init\u dict(数据、索引、列、数据类型=dtype)
303 elif isinstance(数据,ma.MaskedArray):
304掩码=ma.getmaskarray(数据)
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/frame.py in_init_dict(self、data、index、columns、dtype)
389
390#暂时合并
-->391 mgr=块管理器(块、轴)
392退货经理合并()
393
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/internals.py in_uu__________________________
329
330如果进行完整性检查:
-->331自我验证完整性()
332
333定义非零(自):
/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/internals.py in(自我验证)完整性
404经理形状=自我形状
405对于self.blocks中的块:
-->406断言(block.values.shape[1::==mgr_shape[1:])
407总计项目=总和(自块中x的len(x项)
408断言(len(self.items)=tot_items)
断言者错误:
更新答案:
似乎我可以使用我的工作版本的各种库来复制它。稍后我将检查我的主版本,看看这些功能的文档是否有差异
同时,以下内容在使用您精确编辑的版本时对我有效:
In [35]: testDF.aggregate(lambda x: st.sem(x, axis=None))
Out[35]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 600 entries, 0 to 599
Data columns:
A 600 non-null values
B 600 non-null values
C 600 non-null values
D 600 non-null values
E 600 non-null values
F 600 non-null values
G 600 non-null values
H 600 non-null values
I 600 non-null values
J 600 non-null values
dtypes: float64(10)
但是您应该检查以确保这实际上是您想要的SEM值,可能在一些较小的示例数据上
旧答案:
这可能与scipy.stats的模块问题有关吗?当我使用这个模块时,我必须从scipy import stats中将它称为st
或类似的东西import-scipy.stats
不起作用,调用import-scipy;scipy.stats.sem
给出了一个错误,指出不存在名为“stats”的模块
熊猫似乎根本没有找到这种功能。我认为应该改进错误消息,因为这并不明显
>>> from scipy import stats as st
>>> import pandas
>>> import numpy as np
>>> df_list = []
>>> for ii in range(10):
... df_list.append(pandas.DataFrame(np.random.rand(10,3),
... columns = ['A', 'B', 'C']))
...
>>> df_list
# Suppressed the output cause it was big.
>>> testDF = (pandas.concat(df_list, axis=1, keys=range(len(df_list)))
... .swaplevel(0, 1, axis=1)
... .sortlevel(axis=1)
... .groupby(level=0, axis=1))
>>> testDF
<pandas.core.groupby.DataFrameGroupBy object at 0x38524d0>
>>> testDF.aggregate(np.mean)
key_0 A B C
0 0.660324 0.408377 0.374681
1 0.459768 0.345093 0.432542
2 0.498985 0.443794 0.524327
3 0.605572 0.563768 0.558702
4 0.561849 0.488395 0.592399
5 0.466505 0.433560 0.408804
6 0.561591 0.630218 0.543970
7 0.423443 0.413819 0.486188
8 0.514279 0.479214 0.534309
9 0.479820 0.506666 0.449543
>>> testDF.aggregate(np.var)
key_0 A B C
0 0.093908 0.095746 0.055405
1 0.075834 0.077010 0.053406
2 0.094680 0.092272 0.095552
3 0.105740 0.126101 0.099316
4 0.087073 0.087461 0.111522
5 0.105696 0.110915 0.096959
6 0.082860 0.026521 0.075242
7 0.100512 0.051899 0.060778
8 0.105198 0.100027 0.097651
9 0.082184 0.060460 0.121344
>>> testDF.aggregate(st.sem)
A B C
0 0.089278 0.087590 0.095891
1 0.088552 0.081365 0.098071
2 0.087968 0.116361 0.076837
3 0.110369 0.087563 0.096460
4 0.101328 0.111676 0.046567
5 0.085044 0.099631 0.091284
6 0.113337 0.076880 0.097620
7 0.087243 0.087664 0.118925
8 0.080569 0.068447 0.106481
9 0.110658 0.071082 0.084928
>>来自scipy导入统计数据作为st
>>>进口大熊猫
>>>将numpy作为np导入
>>>df_列表=[]
>>>对于范围(10)内的ii:
... df_list.append(pandas.DataFrame(np.random.rand(10,3)),
…列=['A','B','C']))
...
>>>df_列表
#抑制输出,因为它很大。
>>>testDF=(pandas.concat(df_列表,轴=1,键=range(len(df_列表)))
..旋转阀(0,1,轴=1)
..sortlevel(轴=1)
..分组依据(级别=0,轴=1))
>>>testDF
>>>测试聚合度(np.平均值)
键0 A B C
0 0.660324 0.408377 0.374681
1 0.459768 0.345093 0.432542
2 0.498985 0.443794 0.524327
3 0.605572 0.563768 0.558702
4 0.561849 0.488395 0.592399
5 0.466505 0.433560 0.408804
6 0.561591 0.630218 0.543970
7 0.423443 0.413819 0.486188
8 0.514279 0.479214 0.534309
9 0.479820 0.506666 0.449543
>>>testDF.aggregate(np.var)
键0 A B C
0 0.093908 0.095746 0.055405
1 0.075834 0.077010 0.053406
2 0.094680 0.092272 0.095552
3 0.105740 0.126101 0.099316
4 0.087073 0.087461 0.111522
5 0.105696 0.110915 0.096959
6 0.082860 0.026521 0.075242
7 0.100512 0.051899 0.060778
8 0.105198 0.100027 0.097651
9 0.082184 0.060460 0.121344
>>>骨料试验(标准扫描电镜)
A、B、C
0 0.089278 0.087590 0.095891
1 0.088552 0.081365 0.098071
2 0.087968 0.116361 0.076837
3 0.110369 0.087563 0.096460
4 0.101328 0.111676 0.046567
5 0.085044 0.099631 0.091284
6 0.113337 0.076880 0.097620
7 0.087243 0.087664 0.118925
8 0.080569 0.068447 0.106481
9 0.110658 0.071082 0.084928
似乎对我有用。您能复制并粘贴实际的错误消息吗,或者更好的是复制错误的小代码示例?当我尝试它时,它对我起了作用。@DSM:我在原来的问题中添加了错误消息。请注意,我能够在完全相同的数据帧上执行numpy方法,没有问题
In [37]: testDF.aggregate(lambda x: st.sem(x, axis=1))
Out[37]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 600 entries, 0 to 599
Data columns:
A 600 non-null values
B 600 non-null values
C 600 non-null values
D 600 non-null values
E 600 non-null values
F 600 non-null values
G 600 non-null values
H 600 non-null values
I 600 non-null values
J 600 non-null values
dtypes: float64(10)
>>> from scipy import stats as st
>>> import pandas
>>> import numpy as np
>>> df_list = []
>>> for ii in range(10):
... df_list.append(pandas.DataFrame(np.random.rand(10,3),
... columns = ['A', 'B', 'C']))
...
>>> df_list
# Suppressed the output cause it was big.
>>> testDF = (pandas.concat(df_list, axis=1, keys=range(len(df_list)))
... .swaplevel(0, 1, axis=1)
... .sortlevel(axis=1)
... .groupby(level=0, axis=1))
>>> testDF
<pandas.core.groupby.DataFrameGroupBy object at 0x38524d0>
>>> testDF.aggregate(np.mean)
key_0 A B C
0 0.660324 0.408377 0.374681
1 0.459768 0.345093 0.432542
2 0.498985 0.443794 0.524327
3 0.605572 0.563768 0.558702
4 0.561849 0.488395 0.592399
5 0.466505 0.433560 0.408804
6 0.561591 0.630218 0.543970
7 0.423443 0.413819 0.486188
8 0.514279 0.479214 0.534309
9 0.479820 0.506666 0.449543
>>> testDF.aggregate(np.var)
key_0 A B C
0 0.093908 0.095746 0.055405
1 0.075834 0.077010 0.053406
2 0.094680 0.092272 0.095552
3 0.105740 0.126101 0.099316
4 0.087073 0.087461 0.111522
5 0.105696 0.110915 0.096959
6 0.082860 0.026521 0.075242
7 0.100512 0.051899 0.060778
8 0.105198 0.100027 0.097651
9 0.082184 0.060460 0.121344
>>> testDF.aggregate(st.sem)
A B C
0 0.089278 0.087590 0.095891
1 0.088552 0.081365 0.098071
2 0.087968 0.116361 0.076837
3 0.110369 0.087563 0.096460
4 0.101328 0.111676 0.046567
5 0.085044 0.099631 0.091284
6 0.113337 0.076880 0.097620
7 0.087243 0.087664 0.118925
8 0.080569 0.068447 0.106481
9 0.110658 0.071082 0.084928