Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从柱值中减去组平均值_Python_Pandas - Fatal编程技术网

Python 从柱值中减去组平均值

Python 从柱值中减去组平均值,python,pandas,Python,Pandas,所以我有一个基因芯片的数据集,其中16个芯片测量1个组织样本。我想从每个芯片中的每个基因中减去所有芯片中这个基因的平均值。因此,我按基因分组并计算平均数。现在我想取原始的PM强度值,从这个基因中减去平均值。 所以我需要将基因列和表中的索引相匹配,我在表中存储了这个基因组的平均值,然后从PM列中减去这个值 totalgene = genedata.groupby(genedata['GENE']).mean()[['PM','LOGPM']] genedata['MEANNORM'] = ge

所以我有一个基因芯片的数据集,其中16个芯片测量1个组织样本。我想从每个芯片中的每个基因中减去所有芯片中这个基因的平均值。因此,我按基因分组并计算平均数。现在我想取原始的PM强度值,从这个基因中减去平均值。 所以我需要将基因列和表中的索引相匹配,我在表中存储了这个基因组的平均值,然后从PM列中减去这个值

totalgene  = genedata.groupby(genedata['GENE']).mean()[['PM','LOGPM']]

genedata['MEANNORM'] = genedata['PM'] - totalgene.ix[genedata['GENE']]['AVGPM']
genedata['MEANNORM'] = genedata['LOGPM'] - totalgene.ix[genedata['GENE']]['AVGLOGPM']
导致错误的原因:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-08c1bb979f9c> in <module>()
----> 1 genedata['MEANNORM'] = genedata['PM'] - totalgene.ix[genedata['GENE'],'AVGPM']
      2 genedata['MEANNORM'] = genedata['LOGPM'] - totalgene.ix[genedata['GENE'],'AVGLOGPM']

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   2417         else:
   2418             # set column
-> 2419             self._set_item(key, value)
   2420 
   2421     def _setitem_slice(self, key, value):

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   2483 
   2484         self._ensure_valid_index(value)
-> 2485         value = self._sanitize_column(key, value)
   2486         NDFrame._set_item(self, key, value)
   2487 

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   2633 
   2634         if isinstance(value, Series):
-> 2635             value = reindexer(value)
   2636 
   2637         elif isinstance(value, DataFrame):

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\frame.py in reindexer(value)
   2625                     # duplicate axis
   2626                     if not value.index.is_unique:
-> 2627                         raise e
   2628 
   2629                     # other

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\frame.py in reindexer(value)
   2620                 # GH 4107
   2621                 try:
-> 2622                     value = value.reindex(self.index)._values
   2623                 except Exception as e:
   2624 

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   2360     @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs)
   2361     def reindex(self, index=None, **kwargs):
-> 2362         return super(Series, self).reindex(index=index, **kwargs)
   2363 
   2364     @Appender(generic._shared_docs['fillna'] % _shared_doc_kwargs)

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   2257         # perform the reindex on the axes
   2258         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 2259                                   fill_value, copy).__finalize__(self)
   2260 
   2261     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   2275             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   2276                                              fill_value=fill_value,
-> 2277                                              copy=copy, allow_dups=False)
   2278 
   2279         return obj

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   2369                                                 fill_value=fill_value,
   2370                                                 allow_dups=allow_dups,
-> 2371                                                 copy=copy)
   2372 
   2373         if copy and new_data is self._data:

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\core\internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   3837         # some axes don't allow reindexing with dups
   3838         if not allow_dups:
-> 3839             self.axes[axis]._can_reindex(indexer)
   3840 
   3841         if axis >= self.ndim:

C:\Users\timothy\Anaconda3\lib\site-packages\pandas\indexes\base.py in _can_reindex(self, indexer)
   2492         # trying to reindex on an axis with duplicates
   2493         if not self.is_unique and len(indexer):
-> 2494             raise ValueError("cannot reindex from a duplicate axis")
   2495 
   2496     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
---->1 genedata['MEANNORM']=genedata['PM']-totalgene.ix[genedata['GENE'],'AVGPM']
2 genedata['MEANNORM']=genedata['LOGPM']-totalgene.ix[genedata['GENE'],'AVGLOGPM']
C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\frame.py in\uuuuuuu setitem\uuuuuuuu(self、key、value)
2417其他:
2418#集合列
->2419自我设置项目(键、值)
2420
2421 def_setitem_切片(自身、键、值):
C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\frame.py in\u set\u项(self、key、value)
2483
2484自我确保有效索引(值)
->2485值=自清洁列(键,值)
2486 NDFrame.\u设置\u项(自身、键、值)
2487
C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\frame.py在\u sanitize\u列中(self、key、value、broadcast)
2633
2634如果存在(值,系列):
->2635值=重新索引器(值)
2636
2637 elif isinstance(值,数据帧):
reindexer中的C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\frame.py(值)
2625#复制轴
2626如果value.index.is_不是唯一的:
->2627E
2628
2629#其他
reindexer中的C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\frame.py(值)
2620#GH 4107
2621尝试:
->2622 value=value.reindex(self.index)。\u值
2623例外情况除外,如e:
2624
reindex中的C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\series.py(self,index,**kwargs)
2360@Appender(通用._shared_docs['reindex']%_shared_doc_kwargs)
2361 def重新索引(自身,索引=无,**kwargs):
->2362返回超级(系列,自).reindex(索引=索引,**kwargs)
2363
2364@Appender(通用._shared_docs['fillna']%_shared_doc_kwargs)
reindex中的C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\generic.py(self,*args,**kwargs)
2257#在轴上执行重新索引
2258返回自重新索引轴(轴、水平、极限、公差、方法、,
->2259填写值,复制)。\uuuuuu完成\uuuuuuu(自我)
2260
2261定义重新索引轴(自身、轴、水平、限制、公差、方法、填充值、,
C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\generic.py in\u reindex\u axes(self、axes、level、limit、tolerance、method、fill\u value、copy)
2275 obj=obj.\u用索引器({axis:[new\u index,indexer]}重新索引\u,
2276填充值=填充值,
->2277复制=复制,允许重复(错误)
2278
2279返回obj
C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\generic.py in\u reindex\u和索引器(self、reindexer、fill\u value、copy、allow\u dups)
2369填充值=填充值,
2370允许重复=允许重复,
->2371份=份)
2372
2373如果复制和新建数据为自身数据:
reindex\u索引器中的C:\Users\timothy\Anaconda3\lib\site packages\pandas\core\internals.py(self,new\u axis,indexer,axis,fill\u value,allow\u dups,copy)
3837#某些轴不允许使用DUP重新编制索引
3838如果不允许重复:
->3839自轴[轴]。\u可以\u重新索引(索引器)
3840
3841如果轴>=self.ndim:
C:\Users\timothy\Anaconda3\lib\site packages\pandas\index\base.py in\u can\u reindex(self,indexer)
2492#尝试在具有重复项的轴上重新编制索引
2493如果不是self.is_唯一且len(索引器):
->2494 raise VALUE ERROR(“无法从重复轴重新索引”)
2495
2496 def重新索引(自身、目标、方法=无、级别=无、限制=无、,
ValueError:无法从重复轴重新编制索引
我不知道为什么?
有人能帮忙吗?

考虑对内联聚合进行
转换
,该聚合返回一个可以从原始列、PM和LOGPM中减去的序列:


请包括完整的错误信息,完成和道歉:)我正在考虑尝试破译这一点,并提供一些有用的信息给你。。。但我想不出来。如果您能阅读,我将为您的问题添加更多的解释。对不起,谢谢您的帮助!确实需要查看以前从未使用过的转换函数:)
genedata['MEANNORM_PM'] = genedata['PM'] - \
                            genedata.groupby(['GENE'])['PM'].transform('mean')

genedata['MEANNORM_LOGPM'] = genedata['LOGPM'] - \
                               genedata.groupby(['GENE'])['LOGPM'].transform('mean')