Python 有没有办法计算两列之间的函数？_Python_Pandas_Numpy_Dataframe

Python 有没有办法计算两列之间的函数？

python pandas numpy dataframe

Python 有没有办法计算两列之间的函数？,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,我正在寻找一种更快的方法来跨多个列计算某种函数我的数据框看起来像： c = 12*1000 b = int(c/2) d = int(b/2) newdf = {'Class': ['c1']*c+['c2']*c+['c3']*c, 'Section': ['A']*b+['B']*b+['C']*b+['D']*b+['E']*b+['F']*b, 'Time': [1,2,3,4,5,6]*d+[3,1,3,4,5,7]*d} test = pd.Da

我正在寻找一种更快的方法来跨多个列计算某种函数

我的数据框看起来像：

c = 12*1000
b = int(c/2)
d = int(b/2)

newdf = {'Class': ['c1']*c+['c2']*c+['c3']*c,
        'Section': ['A']*b+['B']*b+['C']*b+['D']*b+['E']*b+['F']*b,
        'Time': [1,2,3,4,5,6]*d+[3,1,3,4,5,7]*d}

test = pd.DataFrame(newdf)
test['f_x'] = test['Time']**2/5
test['f_x_2'] = test['Time']**2/5+test['f_x']
#working with 1 column
test['section_mean'] = test.groupby(['Class','Section'])['f_x'].transform(lambda x: x.mean())
test['two_col_sum'] = test[['Time','f_x']].apply(lambda x: x.Time+x.f_x,axis=1)
cols = ['f_x','f_x_2']

我知道如何计算组的一系列列的值：

test['section_mean'] = test.groupby(['Class','Section'])['f_x'].transform(lambda x: x.mean())

或者最终在多个列之间执行简单操作：

test['two_col_sum'] = test[['Time','f_x']].apply(lambda x: x.Time+x.f_x,axis=1)

但是，我要做的是对分组实例的整列进行某种计算：

%%time
slopes_df = pd.DataFrame()
grouped = test.groupby(['Class','Section'])

for name, group in grouped:
    nd=[]
    for col in cols:
        ntest = group[['Time',col]]
        x = ntest.Time
        y = ntest[col]
        f=np.polyfit(x,y, deg=1).round(2)
        data = [name[0],name[1],col,f[0],f[1]]
        nd.append(data)

    slopes_df=pd.concat([slopes_df,pd.DataFrame(nd)])

slopes_df.columns=['Class','Section','col','slope','intercept']
slopes_df_p = pd.pivot_table(data=slopes_df,index=['Class','Section'], columns=['col'], values=['slope','intercept']).reset_index()
slopes_df_p.columns = pd.Index(e[0] if e[0] in ['Class','Section'] else e[0]+'_'+e[1] for e in slopes_df_p.columns)
fdf = pd.merge(test, slopes_df_p, on=['Class','Section'])

我尝试了以下方式提出的解决方案：

%%time
for col in cols:
    df1 = (test.groupby(['Class','Section'])
              .apply(lambda x: np.polyfit(x['Time'],x[col], deg=1).round(2)[0])
              .rename('slope_'+str(col)))
    df2 = (test.groupby(['Class','Section'])
              .apply(lambda x: np.polyfit(x['Time'],x[col], deg=1).round(2)[1])
              .rename('intercept_'+str(col)))
    df1['col']=col
    df2['col']=col

    test = pd.merge(test,df1, on=['Class','Section'])
    test = pd.merge(test,df2, on=['Class','Section'])

但速度似乎较慢，在我的电脑上，第一个循环需要150毫秒，第二个代码需要300毫秒

Andrea

您的循环解决方案无法按组数据运行，因此我认为您需要：

修正了，谢谢你的注意。你似乎没有在这个循环中使用

和

。你也不需要

lambda

在这里，只要使用

test.groupby（['Class'，'Section']）['f_x'].transform（'mean'）

和

test['Time'，'f_x'].sum（1）

。我将在一个更大的数据帧上尝试它！我更新了代码：现在它应该在原始数据帧中的某些组和不同列上循环。@CassAndr-我希望工作正常，测试后请告诉我；）我尝试了你的解决方案，但速度似乎较慢，我将更新问题以澄清，因为这里的代码格式更差。这是最快的！谢谢@Jetrael墙壁时间：192毫秒墙壁时间：337毫秒上一个解决方案：墙壁时间：115毫秒

def f(x):
    for col in cols:
        x[f'slope_{col}'], x[f'intercept_{col}'] = np.polyfit(x['Time'],x[col], deg=1).round(2)
    return x
df1 = test.groupby(['Class','Section']).apply(f)