List 从数据帧创建列表

List 从数据帧创建列表,list,dataframe,pandas-groupby,List,Dataframe,Pandas Groupby,我有一个函数,它接受all、non-distinct、MatchId和(xG_Team1对xG_Team2,成对),并以数组的形式输出。然后总结为sse常数 该函数的问题在于它遍历每一行,复制MatchId我想阻止这一切。 MatchId Event_Id EventCode Team1 Team2 Team1_Goals 0 842079 2053 Goal Away Huachipato Cobresal 0 1

我有一个函数,它接受all、non-distinct、MatchId和(xG_Team1对xG_Team2,成对),并以数组的形式输出。然后总结为sse常数

该函数的问题在于它遍历每一行,复制MatchId我想阻止这一切。

MatchId Event_Id   EventCode        Team1        Team2      Team1_Goals
0   842079  2053    Goal Away    Huachipato  Cobresal       0
1   842079  2053    Goal Away    Huachipato  Cobresal       0
2   842080  1029    Goal Home      Slovan    lava           3
3   842080  1029    Goal Home      Slovan    lava           3
4   842080  2053    Goal Away      Slovan    lava           3
5   842080  1029    Goal Home      Slovan    lava           3
6   842634  2053    Goal Away      Rosario   Boca Juniors   0
7   842634  2053    Goal Away      Rosario   Boca Juniors   0
8   842634  2053    Goal Away      Rosario   Boca Juniors   0
9   842634  2054  Cancel Goal Away Rosario   Boca Juniors   0

    Team2_Goals xG_Team1    xG_Team2    CurrentPlaytime  Home_Goal_Time Away_Goal_Time
0   2       1.79907     1.19893     2616183         0       87
1   2       1.79907     1.19893     3436780         0       115
2   1       1.70662     1.1995      3630545         121     0
3   1       1.70662     1.1995      4769519         159     0
4   1       1.70662     1.1995      5057143         0       169
5   1       1.70662     1.1995      5236213         175     0
6   2       0.82058     1.3465      2102264         0       70
7   2       0.82058     1.3465      4255871         0       142
8   2       0.82058     1.3465      5266652         0       176
9   2       0.82058     1.3465      5273611         0       0
对于每个不同的MatchId,我需要相应的主客场进球列表。即在每次迭代中使用的
Home\u目标
Away\u目标
。从数据帧的
Home\u Goal\u time
Away\u Goal\u time
下面的列表似乎不起作用。

MatchId Event_Id   EventCode        Team1        Team2      Team1_Goals
0   842079  2053    Goal Away    Huachipato  Cobresal       0
1   842079  2053    Goal Away    Huachipato  Cobresal       0
2   842080  1029    Goal Home      Slovan    lava           3
3   842080  1029    Goal Home      Slovan    lava           3
4   842080  2053    Goal Away      Slovan    lava           3
5   842080  1029    Goal Home      Slovan    lava           3
6   842634  2053    Goal Away      Rosario   Boca Juniors   0
7   842634  2053    Goal Away      Rosario   Boca Juniors   0
8   842634  2053    Goal Away      Rosario   Boca Juniors   0
9   842634  2054  Cancel Goal Away Rosario   Boca Juniors   0

    Team2_Goals xG_Team1    xG_Team2    CurrentPlaytime  Home_Goal_Time Away_Goal_Time
0   2       1.79907     1.19893     2616183         0       87
1   2       1.79907     1.19893     3436780         0       115
2   1       1.70662     1.1995      3630545         121     0
3   1       1.70662     1.1995      4769519         159     0
4   1       1.70662     1.1995      5057143         0       169
5   1       1.70662     1.1995      5236213         175     0
6   2       0.82058     1.3465      2102264         0       70
7   2       0.82058     1.3465      4255871         0       142
8   2       0.82058     1.3465      5266652         0       176
9   2       0.82058     1.3465      5273611         0       0
例如,
MatchId=842079,主场目标=[],客场目标=[87115]

x1 = [1,0,0] 
x2 = [0,1,0] 
x3 = [0,0,1]
m = 1 ,arbitrary constant used to optimise sse.
k = 196
total_timeslot = 196 
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal

def sum_squared_diff(x1, x2, x3, y):
    ssd = []
    for k in range(total_timeslot):  # k will take multiple values
        if k in Home_Goal:
            ssd.append(sum((x2 - y) ** 2))
        elif k in Away_Goal:
            ssd.append(sum((x3 - y) ** 2))
        else:
            ssd.append(sum((x1 - y) ** 2))
    return ssd

def my_function(row):
    xG_Team1 = row.xG_Team1
    xG_Team2 = row.xG_Team2
    return np.array([1-(xG_Team1*m + xG_Team2*m)/k, xG_Team1*m/k, xG_Team2*m/k])

results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)

results
sum(results.sum())
对于上面的三个匹配,期望结果应该如下所示。 如果我需要一个单独的
sse,sum(sum_squared_diff(x1,x2,x3,y))
会给出以下信息

MatchId =  842079   =  3.984053038520635
MatchId =  842080   =  7.882189570700502
MatchId =  842080   =  5.929085973050213
考虑到原始数据的规模,实际上我是在追求sse的总和。对于上述样本数据,只需将值相加即可得出
total sse=
17.79532858227135。`一旦我实现了这一点,那么我将尝试通过更新任意值m来基于此图优化sse

下面是我希望函数可以迭代的列表

Home_scored = s.groupby('MatchId')['Home_Goal_time'].apply(list)
Away_scored = s.groupby('MatchId')['Away_Goal_Time'].apply(list)
type(HomeGoal)
pandas.core.series.Series
然后将其转换为列表

Home_Goal = Home_scored.tolist()
Away_Goal = Away_scored.tolist()
type(Home_Goal)
 list

 Home_Goal
Out[303]: [[0, 0], [121, 159, 0, 175], [0, 0, 0, 0]]


Away_Goal 
Out[304]: [[87, 115], [0, 0, 169, 0], [70, 142, 176, 0]]

但是该函数仍然将
Home\u Goal
Away\u Goal
作为空列表
df.groupby('MatchID').apply(...)

请提供样本输出?感谢您抽出时间来研究这个问题。你能看一下这篇编辑过的文章吗?