Python 带遮罩和变换的Groupby
我有这样一个数据帧:Python 带遮罩和变换的Groupby,python,pandas,Python,Pandas,我有这样一个数据帧: POLY_KEY_I Class SP_Percent FS01080100SM001 NA 5.0 MTGP 67.5 Meadow 25.0 Woodland 2.5 FS01080100SM002 PHP 85.0
POLY_KEY_I Class SP_Percent
FS01080100SM001 NA 5.0
MTGP 67.5
Meadow 25.0
Woodland 2.5
FS01080100SM002 PHP 85.0
SP 15.0
对于每个单元POLY_KEY_I
ifClass
=Meadow
和SP_%我想将MTGP
转换成WMTGP
我想要的输出是:
POLY_KEY_I Class SP_Percent
FS01080100SM001 NA 5.0
WMTGP 67.5
Meadow 25.0
Woodland 2.5
FS01080100SM002 PHP 85.0
SP 15.0
我正在尝试的代码如下:
df ['mask'] = ((df['Class'] == 'Meadow') & df['SP_Percent'] >=20)
mask = df.groupby(['POLY_KEY_I'])['mask'].transform('MTGP')
df.loc[mask,'Class']='WMTGP'
print(df)
但这会返回错误:
mask=final.groupby(['POLY_KEY_I'])['mask'].transform('MTGP'))
文件“C:\Users\Stefano\Anaconda2\lib\site packages\pandas\core\groupby.py”,转换中的第2439行
返回self._transform_fast(lambda:getattr(self,func)(*args,**kwargs))
文件“C:\Users\Stefano\Anaconda2\lib\site packages\pandas\core\groupby.py”,第2484行,在快速转换中
值=func()。值
文件“C:\Users\Stefano\Anaconda2\lib\site packages\pandas\core\groupby.py”,第2439行,在
返回self._transform_fast(lambda:getattr(self,func)(*args,**kwargs))
文件“C:\Users\Stefano\Anaconda2\lib\site packages\pandas\core\groupby.py”,第520行,位于getattr
(键入(自身)。名称,属性)
AttributeError:“SeriesGroupBy”对象没有属性“MTGP”
编辑:
我不知道这是否有帮助,但如果我改变这一行:
mask=df.groupby(['POLY_KEY_I'])['mask'].transform('MTGP')
为此:
mask=df.groupby(['POLY_KEY_I'])['mask'].transform('any')
它会将相应的POLY_KEY_ID
的每个值更改为WMTGP
,但我只希望在它是MTGP
时更改它。我使用apply
自定义函数f
将您的解决方案完全更改为groupby
。对于检查字符串值,最好使用
输入(增加第5行用于测试):
编辑1:
增加时间:
%timeit df.groupby(['POLY_KEY_I']).apply(f)
100 loops, best of 3: 4.78 ms per loop
%timeit shahram(df)
10 loops, best of 3: 38.2 ms per loop
时间来源:
import pandas as pd
import numpy as np
import io
temp=u"""POLY_KEY_I;Class;SP_Percent
FS01080100SM001;NA;5.0
FS01080100SM001;MTGP;67.5
FS01080100SM001;Meadow;25.0
FS01080100SM001;Woodland;2.5
FS01080100SM002;PHP;85.0
FS01080100SM002;MTGP;85.0
FS01080100SM002;SP;15.0"""
df = pd.read_csv(io.StringIO(temp), sep=";", index_col=None, parse_dates=False)
print df
print df.dtypes
print df.index
def shahram(df):
df ['mask'] = ((df['Class'] == 'Meadow') & (df['SP_Percent'] >=20))
df2 = df[(df['mask']==True)][['POLY_KEY_I']]
df2['mask2']=True
df = pd.merge(df,df2,how='left')
df.ix[((df['mask2']==True) & (df['Class']=='MTGP')),'Class'] = 'WMTGP'
return df
def f(g):
if ((g['Class'].isin(['Meadow'])) & (g['SP_Percent'] >=20)).any():
g['Class'].loc[g['Class'].isin(['MTGP'])] = 'WMTGP'
return g
else:
return g
print df.groupby(['POLY_KEY_I']).apply(f)
print shahram(df)
我是这样做的:
df ['mask'] = ((df['Class'] == 'Meadow') & (df['SP_Percent'] >=20))
df2 = df[(df['mask']==True)][['POLY_KEY_I']]
df2['mask2']=True
df = pd.merge(df,df2,how='left')
df.ix[((df['mask2']==True) & (df['Class']=='MTGP')),'Class'] = 'WMTGP'
import pandas as pd
import numpy as np
import io
temp=u"""POLY_KEY_I;Class;SP_Percent
FS01080100SM001;NA;5.0
FS01080100SM001;MTGP;67.5
FS01080100SM001;Meadow;25.0
FS01080100SM001;Woodland;2.5
FS01080100SM002;PHP;85.0
FS01080100SM002;MTGP;85.0
FS01080100SM002;SP;15.0"""
df = pd.read_csv(io.StringIO(temp), sep=";", index_col=None, parse_dates=False)
print df
print df.dtypes
print df.index
def shahram(df):
df ['mask'] = ((df['Class'] == 'Meadow') & (df['SP_Percent'] >=20))
df2 = df[(df['mask']==True)][['POLY_KEY_I']]
df2['mask2']=True
df = pd.merge(df,df2,how='left')
df.ix[((df['mask2']==True) & (df['Class']=='MTGP')),'Class'] = 'WMTGP'
return df
def f(g):
if ((g['Class'].isin(['Meadow'])) & (g['SP_Percent'] >=20)).any():
g['Class'].loc[g['Class'].isin(['MTGP'])] = 'WMTGP'
return g
else:
return g
print df.groupby(['POLY_KEY_I']).apply(f)
print shahram(df)
df ['mask'] = ((df['Class'] == 'Meadow') & (df['SP_Percent'] >=20))
df2 = df[(df['mask']==True)][['POLY_KEY_I']]
df2['mask2']=True
df = pd.merge(df,df2,how='left')
df.ix[((df['mask2']==True) & (df['Class']=='MTGP')),'Class'] = 'WMTGP'