Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/wix/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas 定制组聚合_Pandas_Grouping_Customization_Aggregation - Fatal编程技术网

Pandas 定制组聚合

Pandas 定制组聚合,pandas,grouping,customization,aggregation,Pandas,Grouping,Customization,Aggregation,我有一个关于熊猫和定制群体聚合的问题,以找到最有效的方法来计算我的价值。以下是我的代码片段: import pandas as pd listA = list('abcdefghijklmnopqrstuvwxyz') * 2 listB = listA[::-1] listC = listA[::2] * 2 listD = "Won" data1 = range(52) data2 = range(52,104) data3 = range(104,156) rawStructure

我有一个关于熊猫和定制群体聚合的问题,以找到最有效的方法来计算我的价值。以下是我的代码片段:

import pandas as pd

listA = list('abcdefghijklmnopqrstuvwxyz') * 2
listB = listA[::-1]
listC = listA[::2] * 2
listD = "Won"
data1 = range(52) 
data2 = range(52,104) 
data3 = range(104,156)

rawStructure = [('A', listA),
                ('B', listB),
                ('C', listC),
                ('D', listD),
                ('Data1', data1),
                ('Data2', data2),
                ('Data3', data3)]
df = pd.DataFrame.from_items(rawStructure, orient='columns')

df.loc[40:,"D"] = "Lost" 

def customfct(x,y,z):
    print('x',x)
    data = round(((x.sum() + y.sum())/z.sum()) * 100,2)
    return  data

def f(row): 
    val1 = row.loc[(row['D'] == "Won"), 'Data1'].sum()
    val2 = row.loc[(row['D'] == "Won"), 'Data2'].sum()
    val3 = row.loc[(row['D'] == "Won"), 'Data3'].sum()
    val4 = customfct(row.loc[(row['D'] == "Won"), 'Data1'], row.loc[(row['D'] == "Won"), 'Data2'], row.loc[(row['D'] == "Won"), 'Data3'])
    return val1, val2, val3, val4

groupByCriteria = "C"
agg = df[:].groupby(by=groupByCriteria).apply(f)
print(agg)
我想知道是否有更有效的方法进行分组并应用自定义计算(如函数“customfct”,它使用不同的列(Data1、Data2、Data3))。我的第一种方法是您可以在这里看到的:但是创建一个不受一列约束的公式(例如lambda x:max(x)-min(x))似乎是不可行的。此外,如何返回熊猫数据帧而不是熊猫序列(带有元组)?提前谢谢

这是我当前的输出(这是正确的,但我想还有一种更有效的方法):

考虑在一个
groupby()调用中聚合所有数据列,然后为val4创建一个新列。然后将聚合合并回原始数据帧

# EQUIVALENT EXAMPLE DATA
listA = list('abcdefghijklmnopqrstuvwxyz') * 2
df = pd.DataFrame({'A': listA, 'B': listA[::-1], 'C': listA[::2] * 2,
                   'D': ["Won" for i in range(40)] + ["Lost" for i in range(40,52)],
                   'Data1': range(52), 'Data2': range(52,104), 'Data3': range(104,156)})

# ADJUSTED METHOD
groupByCriteria = "C"
grp = df[df['D']=="Won"].groupby(by=groupByCriteria).sum().reset_index()\
                              .rename(columns={'Data1':'val1','Data2':'val2','Data3':'val3'})
grp['val4'] = round(((grp['val1'] + grp['val2'])/grp['val3']) * 100,2)

agg = df.merge(grp, on='C').sort_values('Data1').reset_index(drop=True)

在定时比较中,调整后的代码明显更快。注意:您的方法已调整为返回数据帧而不是序列

def origfct():
    def customfct(x,y,z):
        #print('x',x)
        data = round(((x.sum() + y.sum())/z.sum()) * 100,2)
        return data

    def f(row): 
        row['val1'] = row.loc[(row['D'] == "Won"), 'Data1'].sum()
        row['val2'] = row.loc[(row['D'] == "Won"), 'Data2'].sum()
        row['val3'] = row.loc[(row['D'] == "Won"), 'Data3'].sum()
        row['val4'] = customfct(row.loc[(row['D'] == "Won"), 'Data1'],
                                row.loc[(row['D'] == "Won"), 'Data2'],
                                row.loc[(row['D'] == "Won"), 'Data3'])
        return row

    groupByCriteria = "C"
    agg = df[:].groupby(by=groupByCriteria).apply(f)
    return agg

def newsetup():
    groupByCriteria = "C"
    grp = df[df['D']=="Won"].groupby(by=groupByCriteria).sum().reset_index()\
                           .rename(columns={'Data1':'val1','Data2':'val2','Data3':'val3'})
    grp['val4'] = round(((grp['val1'] + grp['val2'])/grp['val3']) * 100,2)

    agg = df.merge(grp, on='C').sort_values('Data1').reset_index(drop=True)
    return agg


python -mtimeit -n'100' -s'import pyscript as test' 'test.origfct()'
# 100 loops, best of 3: 198 msec per loop

python -mtimeit -n'100' -s'import pyscript as test' 'test.newsetup()'
# 100 loops, best of 3: 16 msec per loop

你具体的第一个问题是什么?也许实际的数据,当前的结果,期望的结果会有所帮助。我在我原来的帖子中做了一些改变,你还需要什么吗?这正是我想要的。这是一种非常好的快速计算方法。我还没想过要重新入伙。谢谢!