List 从数据帧创建列表
我有一个函数,它接受all、non-distinct、MatchId和(xG_Team1对xG_Team2,成对),并以数组的形式输出。然后总结为sse常数 该函数的问题在于它遍历每一行,复制MatchId我想阻止这一切。List 从数据帧创建列表,list,dataframe,pandas-groupby,List,Dataframe,Pandas Groupby,我有一个函数,它接受all、non-distinct、MatchId和(xG_Team1对xG_Team2,成对),并以数组的形式输出。然后总结为sse常数 该函数的问题在于它遍历每一行,复制MatchId我想阻止这一切。 MatchId Event_Id EventCode Team1 Team2 Team1_Goals 0 842079 2053 Goal Away Huachipato Cobresal 0 1
MatchId Event_Id EventCode Team1 Team2 Team1_Goals
0 842079 2053 Goal Away Huachipato Cobresal 0
1 842079 2053 Goal Away Huachipato Cobresal 0
2 842080 1029 Goal Home Slovan lava 3
3 842080 1029 Goal Home Slovan lava 3
4 842080 2053 Goal Away Slovan lava 3
5 842080 1029 Goal Home Slovan lava 3
6 842634 2053 Goal Away Rosario Boca Juniors 0
7 842634 2053 Goal Away Rosario Boca Juniors 0
8 842634 2053 Goal Away Rosario Boca Juniors 0
9 842634 2054 Cancel Goal Away Rosario Boca Juniors 0
Team2_Goals xG_Team1 xG_Team2 CurrentPlaytime Home_Goal_Time Away_Goal_Time
0 2 1.79907 1.19893 2616183 0 87
1 2 1.79907 1.19893 3436780 0 115
2 1 1.70662 1.1995 3630545 121 0
3 1 1.70662 1.1995 4769519 159 0
4 1 1.70662 1.1995 5057143 0 169
5 1 1.70662 1.1995 5236213 175 0
6 2 0.82058 1.3465 2102264 0 70
7 2 0.82058 1.3465 4255871 0 142
8 2 0.82058 1.3465 5266652 0 176
9 2 0.82058 1.3465 5273611 0 0
对于每个不同的MatchId,我需要相应的主客场进球列表。即在每次迭代中使用的Home\u目标
和Away\u目标
。从数据帧的Home\u Goal\u time
和Away\u Goal\u time
列下面的列表似乎不起作用。
MatchId Event_Id EventCode Team1 Team2 Team1_Goals
0 842079 2053 Goal Away Huachipato Cobresal 0
1 842079 2053 Goal Away Huachipato Cobresal 0
2 842080 1029 Goal Home Slovan lava 3
3 842080 1029 Goal Home Slovan lava 3
4 842080 2053 Goal Away Slovan lava 3
5 842080 1029 Goal Home Slovan lava 3
6 842634 2053 Goal Away Rosario Boca Juniors 0
7 842634 2053 Goal Away Rosario Boca Juniors 0
8 842634 2053 Goal Away Rosario Boca Juniors 0
9 842634 2054 Cancel Goal Away Rosario Boca Juniors 0
Team2_Goals xG_Team1 xG_Team2 CurrentPlaytime Home_Goal_Time Away_Goal_Time
0 2 1.79907 1.19893 2616183 0 87
1 2 1.79907 1.19893 3436780 0 115
2 1 1.70662 1.1995 3630545 121 0
3 1 1.70662 1.1995 4769519 159 0
4 1 1.70662 1.1995 5057143 0 169
5 1 1.70662 1.1995 5236213 175 0
6 2 0.82058 1.3465 2102264 0 70
7 2 0.82058 1.3465 4255871 0 142
8 2 0.82058 1.3465 5266652 0 176
9 2 0.82058 1.3465 5273611 0 0
例如,MatchId=842079,主场目标=[],客场目标=[87115]
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
m = 1 ,arbitrary constant used to optimise sse.
k = 196
total_timeslot = 196
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal
def sum_squared_diff(x1, x2, x3, y):
ssd = []
for k in range(total_timeslot): # k will take multiple values
if k in Home_Goal:
ssd.append(sum((x2 - y) ** 2))
elif k in Away_Goal:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return ssd
def my_function(row):
xG_Team1 = row.xG_Team1
xG_Team2 = row.xG_Team2
return np.array([1-(xG_Team1*m + xG_Team2*m)/k, xG_Team1*m/k, xG_Team2*m/k])
results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)
results
sum(results.sum())
对于上面的三个匹配,期望结果应该如下所示。
如果我需要一个单独的sse,sum(sum_squared_diff(x1,x2,x3,y))
会给出以下信息
MatchId = 842079 = 3.984053038520635
MatchId = 842080 = 7.882189570700502
MatchId = 842080 = 5.929085973050213
考虑到原始数据的规模,实际上我是在追求sse的总和。对于上述样本数据,只需将值相加即可得出total sse=
17.79532858227135。`一旦我实现了这一点,那么我将尝试通过更新任意值m来基于此图优化sse
下面是我希望函数可以迭代的列表
Home_scored = s.groupby('MatchId')['Home_Goal_time'].apply(list)
Away_scored = s.groupby('MatchId')['Away_Goal_Time'].apply(list)
type(HomeGoal)
pandas.core.series.Series
然后将其转换为列表
Home_Goal = Home_scored.tolist()
Away_Goal = Away_scored.tolist()
type(Home_Goal)
list
Home_Goal
Out[303]: [[0, 0], [121, 159, 0, 175], [0, 0, 0, 0]]
Away_Goal
Out[304]: [[87, 115], [0, 0, 169, 0], [70, 142, 176, 0]]
但是该函数仍然将
Home\u Goal
和Away\u Goal
作为空列表如果你只想一次考虑一个匹配项,则应该<代码>。
df.groupby('MatchID').apply(...)
请提供样本输出?感谢您抽出时间来研究这个问题。你能看一下这篇编辑过的文章吗?