Python ';数据帧构造函数未正确调用';带for循环的听写理解
编辑问题以更好地呈现问题 我正在学习数据分析,不知道这里有什么问题 我通过API获取数据,并对其进行df,其中行表示一个匹配,其中一列以嵌套dict列表的形式包含dota匹配中所有玩家的各种信息(原始dict有点大,所以如果需要的话,我不知道如何将其包含在这里) 我想做的是为每场比赛的特定玩家创建一个带有详细统计数据的df。为此,我正在尝试:Python ';数据帧构造函数未正确调用';带for循环的听写理解,python,python-3.x,pandas,Python,Python 3.x,Pandas,编辑问题以更好地呈现问题 我正在学习数据分析,不知道这里有什么问题 我通过API获取数据,并对其进行df,其中行表示一个匹配,其中一列以嵌套dict列表的形式包含dota匹配中所有玩家的各种信息(原始dict有点大,所以如果需要的话,我不知道如何将其包含在这里) 我想做的是为每场比赛的特定玩家创建一个带有详细统计数据的df。为此,我正在尝试: 循环浏览原始df中“玩家”列中的每一行(每一行代表一个游戏) 为每个玩家创建dfs,并将其存储在dict中(现在我们有了一个dfs dict,每个由10行
{i: pd.DataFrame(in_df.players[i]) for i in range(10)}
也可以按预期工作。
但是这个:
names_for_dfs = [i for i in range(len(in_df))]
{name: pd.DataFrame(in_df.players[name]) for name in names_for_dfs}
不起作用。有关职能:
def get_player_stats(in_df, cols_to_keep, player_id):
#create a df from 'players' column for each game (row) - it contains 10 rows for 10 players
#find a row with player_id for player in each game (each df) and append it to out_df
out_df = pd.DataFrame()
names_for_dfs = [row for row in range(len(in_df))]
dfs = {
name : pd.DataFrame(in_df.loc[name, 'players'])
for name in names_for_dfs
}
for name, df in dfs.items():
out_df = out_df.append(df[df.account_id.isin([player_id])], ignore_index=True) # get a row by id and append to final df
return out_df[cols_to_keep]
我得到一个错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-1a40ba2737e6> in <module>
7 return dfs
8
----> 9 dfs = get_player_stats(matches_data, core_stats, 34505203)
10 dfs
<ipython-input-27-1a40ba2737e6> in get_player_stats(in_df, cols_to_keep, player_id)
3 dfs = {
4 name : pd.DataFrame(in_df.loc[name, 'players'])
----> 5 for name in names_for_dfs
6 }
7 return dfs
<ipython-input-27-1a40ba2737e6> in <dictcomp>(.0)
3 dfs = {
4 name : pd.DataFrame(in_df.loc[name, 'players'])
----> 5 for name in names_for_dfs
6 }
7 return dfs
~\miniconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
507 )
508 else:
--> 509 raise ValueError("DataFrame constructor not properly called!")
510
511 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
然后我从中创建了df,它看起来有点像原创的
stats = pd.DataFrame(data = data)
然后,我使用与上面相同的步骤来确保一切正常,但事情进展顺利,没有错误
in_df = stats
names_for_dfs = [i for i in range(len(in_df))]
dfs = {name: pd.DataFrame(in_df.loc[name, 'players']) for name in names_for_dfs}
打印出这个
{0: match_id stat1 stat2 stat3
0 5490791923 101 [1, 2, 3] {1: 1, 2: 2, 3: [1, 2, 3]},
1: match_id stat1 stat2 stat3
0 5490791923 101 [1, 2, 3] {1: 1, 2: 2, 3: [1, 2, 3]},
2: match_id stat1 stat2 stat3
0 5490791923 101 [1, 2, 3] {1: 1, 2: 2, 3: [1, 2, 3]}}
所以现在我开始思考,有什么区别会阻止解决方案最初起作用?
获取原始数据的代码:
def get_player_ids(team_id: int):
players = requests.get(f'https://api.opendota.com/api/teams/{team_id}/players').json()
ids = []
keys = ['account_id', 'name']
for player in players:
for k, v in player.items():
if k in keys:
ids.append({k: v})
print(ids)
return ids
def get_team_id(team_name: str):
teams = pd.DataFrame(requests.get('https://api.opendota.com/api/teams').json())
team_id = int(teams.team_id[teams.name.str.lower() == team_name.lower()])
get_player_ids(team_id)
return team_id
columns = ['match_id', 'duration', 'radiant_score', 'dire_score', 'radiant_gold_adv',
'radiant_xp_adv', 'radiant_team', 'dire_team', 'players', 'league', 'patch', 'start_time']
def get_match_data_for_team(team_id: int):
l = requests.get(f'https://api.opendota.com/api/teams/{team_id}/matches').json()
match_ids = [d['match_id'] for d in l]
matches_data = []
for m_id in match_ids:
matches_data.append(requests.get('http://api.opendota.com/api/matches/' + f'{m_id}').json())
return pd.DataFrame(matches_data)[columns]
matches_data = get_match_data_for_team(get_team_id('nigma'))
编辑:
已修复,以下代码现在起作用:
def get_player_stats(in_df, cols_to_keep, player_id):
#create a df from 'players' column for each game (row) - it contains 10 rows for 10 players
#find a row with player_id for MC in each game (df) and append it to out_df
out_df = pd.DataFrame()
dfs = {}
names_for_dfs = [row for row in range(len(in_df))]
for name in names_for_dfs:
for player_dict in in_df.players[name]:
if isinstance(player_dict, dict) and player_dict['account_id'] == player_id:
df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
dfs.update({name: df})
for name, df in dfs.items():
out_df = out_df.append(df)
return out_df[cols_to_keep]
但我错过了几排
if isinstance(player_dict, dict) and player_dict['account_id'] == player_id:
在这种情况下,似乎是这样的,因为匹配的数据有193行,但out\u df
只有143行。
这边
out_df = pd.DataFrame()
dfs = {}
for match_number in range(len(matches_data)):
for player_dict in matches_data.players[match_number]:
if isinstance(player_dict, dict):
df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
dfs.update({match_number: df})
for name, df in dfs.items():
out_df = out_df.append(df[df.account_id.isin([34505203])], ignore_index=True)
我得到的更少——138行。如何在这些巢穴中正确搜索所需的玩家?我会尝试:
- 将您的函数简化为下面的内容,这满足了您的目标,即在所有比赛中生成给定球员编译的统计数据的
DataFrame
- 当从
DataFrame
中嵌套的dict
中提取数据时,字典逻辑有助于减轻一些索引复杂性,因此此函数将DataFrame
作为\u df
中的,但使用DataFrame.to\u dict()
方法将其更改为dict
代码:
输出:
champ cs
match1 Zoe 700.0
match2 Syndra 800.0
我想出来了。有两个问题:
Pandas在尝试从每个玩家的dict
中生成数据帧时引发了一个错误(存储在“玩家”列系列中的列表中的十个索引),因为所有玩家的dict
都具有不同数组长度的值。例如,“match_id”值只是一个数字,而“ability_upgrades_arr”值是许多数字的列表。这很容易通过将值放入列表中来解决,本质上使每个值的数组长度等于1
当你在比赛中反复寻找球员dicts
,有时他们不在那里(检查数据)。因此熊猫正试图用str
或float
(np.nan
)制作DataFrame
)。使用内置Python函数isinstance()
修复此问题
这是我的代码版本,它只是将每个播放机的dict
作为DataFrame
获取为任意键的值,正如您在上面尝试的那样。从这里开始,只需通过迭代进入dict
,查找一个玩家姓名的数据
import requests
import pandas as pd
def get_team_matches_from_api(team: str) -> pd.DataFrame:
"Get data from all matches played by :arg:`team`"
# First, pull all teams from OpenDOTA so that we can...
teams = pd.DataFrame(requests.get('https://api.opendota.com/api/teams').json())
# ...get Team ID of :arg:`team`
team_id = int(teams.team_id[teams.name.str.lower() == team.lower()])
# Second, pull all games played by :arg:`team` so that we can...
team_matches = requests.get(f'https://api.opendota.com/api/teams/{team_id}/matches').json()
# ...get match IDs for each match played
match_ids = [team_match['match_id'] for team_match in team_matches]
# Third, go back to OpenDOTA and pull each match played by :arg:`team` using our match IDs above
matches_data = []
for match_id in match_ids:
matches_data.append(requests.get(f'http://api.opendota.com/api/matches/{match_id}').json())
# Fourth, put the data from the pulled matches into a `DataFrame`
columns = ['match_id', 'duration', 'radiant_score', 'dire_score', 'radiant_gold_adv', 'radiant_xp_adv', 'radiant_team', 'dire_team', 'players', 'league', 'patch', 'start_time']
df = pd.DataFrame(matches_data)[columns]
return df.fillna('N/A')
def get_player_data_from_team_matches(team_matches: pd.DataFrame) -> dict:
"Pull player data from all games in :arg:`team_matches`"
players_from_team_matches = dict()
for match_number in range(team_matches.players.shape[0]):
for player_dict in team_matches.players[match_number]:
if isinstance(player_dict, dict):
df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
players_from_team_matches.update({match_number: df})
return players_from_team_matches
# DataFrame of all Nigma matches
nigma = get_team_matches_from_api('nigma')
# Dictionary of all player data from every Nigma match
nigma_players = get_player_data_from_team_matches(nigma)
非常感谢您的投入!我应该添加数据的外观。但是你知道为什么我的方法不起作用吗?嗯,我最好的猜测是熊猫们对你在df中使用的.loc[]
方法感到难过。您得到了ValueError:DataFrame构造函数未正确调用代码>你的dict
理解中的异常。我将查看,看看您是否可以重新表述DataFrame
结构。作为参考,我能够使用我上面给出的代码成功地使用DataFrame
constructiondict
comp:dfs={I:pd.DataFrame(faker.loc['match1'])来表示范围(5)}
。实际上,我今天刚刚回到这里,它工作了。我的意思是没有任何变化。我还应该提到,它以前曾在某个时候起过作用,而且形式也完全相同。因此,我完全不知道发生了什么,事实上,事情并不顺利,所以我编辑了这个问题,并提供了更多的细节和步骤,以重现我的想法,谢谢你!将最后一次编辑添加到问题中-不知何故,我们在过程中丢失了行
import pandas as pd
def get_player_stats(in_df, player_id):
df = pd.DataFrame()
for match, players in in_df.to_dict()['players'].items():
# {match1: {players}}
for player, info in players.items():
# {player1: {info}}
if info['account_id'] == player_id:
# {player1: {'account_id': player_id}}
df = df.append(pd.Series(data=info, name=match))
cols_to_keep = [col for col in df.columns if col != 'account_id']
return df[cols_to_keep]
# I assume your data looks something like this:
matches_2020 = {
'date': {
'match1': '2020-06-01',
'match2': '2020-06-02'
},
'players': {
'match1': {
'player1': {'account_id': 'FAKER', 'cs': 700, 'champ': 'Zoe'},
'player2': {'account_id': 'BJERGSON', 'cs': 500, 'champ': 'Talon'}
},
'match2': {
'player1': {'account_id': 'FAKER', 'cs': 800, 'champ': 'Syndra'},
'player2': {'account_id': 'REDMERCY', 'cs': 500, 'champ': 'Zed'}
}
}
}
in_df = pd.DataFrame(matches_2020)
# Let's pull Faker's stats:
faker = get_player_stats(in_df, 'FAKER')
print(faker)
champ cs
match1 Zoe 700.0
match2 Syndra 800.0
import requests
import pandas as pd
def get_team_matches_from_api(team: str) -> pd.DataFrame:
"Get data from all matches played by :arg:`team`"
# First, pull all teams from OpenDOTA so that we can...
teams = pd.DataFrame(requests.get('https://api.opendota.com/api/teams').json())
# ...get Team ID of :arg:`team`
team_id = int(teams.team_id[teams.name.str.lower() == team.lower()])
# Second, pull all games played by :arg:`team` so that we can...
team_matches = requests.get(f'https://api.opendota.com/api/teams/{team_id}/matches').json()
# ...get match IDs for each match played
match_ids = [team_match['match_id'] for team_match in team_matches]
# Third, go back to OpenDOTA and pull each match played by :arg:`team` using our match IDs above
matches_data = []
for match_id in match_ids:
matches_data.append(requests.get(f'http://api.opendota.com/api/matches/{match_id}').json())
# Fourth, put the data from the pulled matches into a `DataFrame`
columns = ['match_id', 'duration', 'radiant_score', 'dire_score', 'radiant_gold_adv', 'radiant_xp_adv', 'radiant_team', 'dire_team', 'players', 'league', 'patch', 'start_time']
df = pd.DataFrame(matches_data)[columns]
return df.fillna('N/A')
def get_player_data_from_team_matches(team_matches: pd.DataFrame) -> dict:
"Pull player data from all games in :arg:`team_matches`"
players_from_team_matches = dict()
for match_number in range(team_matches.players.shape[0]):
for player_dict in team_matches.players[match_number]:
if isinstance(player_dict, dict):
df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
players_from_team_matches.update({match_number: df})
return players_from_team_matches
# DataFrame of all Nigma matches
nigma = get_team_matches_from_api('nigma')
# Dictionary of all player data from every Nigma match
nigma_players = get_player_data_from_team_matches(nigma)