Python Groupby函数矢量化
我发布了前一个问题(),它提供了一个成功的答案:Python Groupby函数矢量化,python,pandas,numpy,Python,Pandas,Numpy,我发布了前一个问题(),它提供了一个成功的答案: import io import pandas as pd data = """ id,atr1,atr2,orig_date,fix_date 1,bolt,l,2000-01-01,nan 1,screw,l,2000-01-01,nan 1,stem,l,2000-01-01,nan 2,stem,l,2000-01-01,nan 2,screw,l,2000-01-01,nan 2,stem,l,2001-01-01,2001-01-
import io
import pandas as pd
data = """
id,atr1,atr2,orig_date,fix_date
1,bolt,l,2000-01-01,nan
1,screw,l,2000-01-01,nan
1,stem,l,2000-01-01,nan
2,stem,l,2000-01-01,nan
2,screw,l,2000-01-01,nan
2,stem,l,2001-01-01,2001-01-01
3,bolt,r,2000-01-01,nan
3,stem,r,2000-01-01,nan
3,bolt,r,2001-01-01,2001-01-01
3,stem,r,2001-01-01,2001-01-01
"""
data = io.StringIO(data)
df = pd.read_csv(data, parse_dates=['orig_date', 'fix_date'])
def f(g):
min_fix_date = g['fix_date'].min()
if pd.isnull(min_fix_date):
g['failed_part_ind'] = 0
else:
g['failed_part_ind'] = g['orig_date'].apply(lambda d: 1 if d < min_fix_date else 0)
return g
df.groupby(['id', 'atr1', 'atr2']).apply(lambda g: f(g))
但是,我现在正在尝试开发一个优化/矢量化版本,以改进运行时并扩展到更大的数据集。欢迎提供任何提示或技巧!我目前正在试用pandas
.idxmin()
和numpy.argmin()
这能满足你的需要吗
df.groupby(['id','atr1','atr2']).apply(lambda x: (x.orig_date < pd.to_datetime(x.fix_date.min()))
.astype(int)).reset_index()
df.groupby(['id','atr1','atr2'])。应用(lambda x:(x.orig\u date
感谢您的贡献!不幸的是,在适应它之后,这似乎对我不起作用。我想知道是否存在相关的.groupby转换方法?我得重新考虑一下。
df.groupby(['id','atr1','atr2']).apply(lambda x: (x.orig_date < pd.to_datetime(x.fix_date.min()))
.astype(int)).reset_index()