Python ';数据帧构造函数未正确调用';带for循环的听写理解

Python ';数据帧构造函数未正确调用';带for循环的听写理解,python,python-3.x,pandas,Python,Python 3.x,Pandas,编辑问题以更好地呈现问题 我正在学习数据分析,不知道这里有什么问题 我通过API获取数据,并对其进行df,其中行表示一个匹配,其中一列以嵌套dict列表的形式包含dota匹配中所有玩家的各种信息(原始dict有点大,所以如果需要的话,我不知道如何将其包含在这里) 我想做的是为每场比赛的特定玩家创建一个带有详细统计数据的df。为此,我正在尝试: 循环浏览原始df中“玩家”列中的每一行(每一行代表一个游戏) 为每个玩家创建dfs,并将其存储在dict中(现在我们有了一个dfs dict,每个由10行

编辑问题以更好地呈现问题

我正在学习数据分析,不知道这里有什么问题

我通过API获取数据,并对其进行df,其中行表示一个匹配,其中一列以嵌套dict列表的形式包含dota匹配中所有玩家的各种信息(原始dict有点大,所以如果需要的话,我不知道如何将其包含在这里)

我想做的是为每场比赛的特定玩家创建一个带有详细统计数据的df。为此,我正在尝试:

  • 循环浏览原始df中“玩家”列中的每一行(每一行代表一个游戏)
  • 为每个玩家创建dfs,并将其存储在dict中(现在我们有了一个dfs dict,每个由10行组成,其中10行代表游戏中的10名玩家,列代表他们的统计数据)
  • 循环遍历这些存储的df,在其中找到所需的行(按player_id),并将其附加到最终df
  • 现在问题来了:

    所以

    它本身工作并创建df

    {i: pd.DataFrame(in_df.players[i]) for i in range(10)}
    
    也可以按预期工作。 但是这个:

    names_for_dfs = [i for i in range(len(in_df))]
    {name: pd.DataFrame(in_df.players[name]) for name in names_for_dfs}
    
    不起作用。有关职能:

    def get_player_stats(in_df, cols_to_keep, player_id):
    #create a df from 'players' column for each game (row) - it contains 10 rows for 10 players
    #find a row with player_id for player in each game (each df) and append it to out_df
    out_df = pd.DataFrame()
    
    names_for_dfs = [row for row in range(len(in_df))]
         
    dfs = {
    name : pd.DataFrame(in_df.loc[name, 'players'])
    for name in names_for_dfs
    } 
    
    for name, df in dfs.items():
        out_df = out_df.append(df[df.account_id.isin([player_id])], ignore_index=True)  # get a row by id and append to final df
    return out_df[cols_to_keep]
    
    我得到一个错误:

        ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-27-1a40ba2737e6> in <module>
          7     return dfs
          8 
    ----> 9 dfs = get_player_stats(matches_data, core_stats, 34505203)
         10 dfs
    
    <ipython-input-27-1a40ba2737e6> in get_player_stats(in_df, cols_to_keep, player_id)
          3     dfs = {
          4     name : pd.DataFrame(in_df.loc[name, 'players'])
    ----> 5     for name in names_for_dfs
          6     }
          7     return dfs
    
    <ipython-input-27-1a40ba2737e6> in <dictcomp>(.0)
          3     dfs = {
          4     name : pd.DataFrame(in_df.loc[name, 'players'])
    ----> 5     for name in names_for_dfs
          6     }
          7     return dfs
    
    ~\miniconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
        507                 )
        508             else:
    --> 509                 raise ValueError("DataFrame constructor not properly called!")
        510 
        511         NDFrame.__init__(self, mgr, fastpath=True)
    
    ValueError: DataFrame constructor not properly called!
    
    然后我从中创建了df,它看起来有点像原创的

    stats = pd.DataFrame(data = data)
    
    然后,我使用与上面相同的步骤来确保一切正常,但事情进展顺利,没有错误

    in_df = stats
    names_for_dfs = [i for i in range(len(in_df))]
    dfs = {name: pd.DataFrame(in_df.loc[name, 'players']) for name in names_for_dfs}
    
    打印出这个

    {0:      match_id  stat1      stat2                       stat3
     0  5490791923    101  [1, 2, 3]  {1: 1, 2: 2, 3: [1, 2, 3]},
     1:      match_id  stat1      stat2                       stat3
     0  5490791923    101  [1, 2, 3]  {1: 1, 2: 2, 3: [1, 2, 3]},
     2:      match_id  stat1      stat2                       stat3
     0  5490791923    101  [1, 2, 3]  {1: 1, 2: 2, 3: [1, 2, 3]}}
    
    所以现在我开始思考,有什么区别会阻止解决方案最初起作用? 获取原始数据的代码:

    def get_player_ids(team_id: int):
        players = requests.get(f'https://api.opendota.com/api/teams/{team_id}/players').json()
        ids = []
        keys = ['account_id', 'name']
        for player in players:
            for k, v in player.items():
                if k in keys:
                    ids.append({k: v})
        print(ids)
        return ids
    
    def get_team_id(team_name: str):
        teams = pd.DataFrame(requests.get('https://api.opendota.com/api/teams').json())
        team_id = int(teams.team_id[teams.name.str.lower() == team_name.lower()])
        get_player_ids(team_id)
        return team_id
    
    columns = ['match_id', 'duration', 'radiant_score', 'dire_score', 'radiant_gold_adv',
               'radiant_xp_adv', 'radiant_team', 'dire_team', 'players', 'league', 'patch', 'start_time']
    def get_match_data_for_team(team_id: int):    
        l = requests.get(f'https://api.opendota.com/api/teams/{team_id}/matches').json()
        match_ids = [d['match_id'] for d in l]
        matches_data = []
        for m_id in match_ids:
            matches_data.append(requests.get('http://api.opendota.com/api/matches/' + f'{m_id}').json())
        
        return pd.DataFrame(matches_data)[columns]
    
    matches_data = get_match_data_for_team(get_team_id('nigma'))
    
    编辑: 已修复,以下代码现在起作用:

    def get_player_stats(in_df, cols_to_keep, player_id):
        #create a df from 'players' column for each game (row) - it contains 10 rows for 10 players
        #find a row with player_id for MC in each game (df) and append it to out_df
        out_df = pd.DataFrame()
        dfs = {}
    
        names_for_dfs = [row for row in range(len(in_df))]
        for name in names_for_dfs:
            for player_dict in in_df.players[name]:
                if isinstance(player_dict, dict) and player_dict['account_id'] == player_id:
                    df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
                    dfs.update({name: df})
    
        for name, df in dfs.items():
            out_df = out_df.append(df)
            
        return out_df[cols_to_keep]
    
    但我错过了几排

    if isinstance(player_dict, dict) and player_dict['account_id'] == player_id:
    
    在这种情况下,似乎是这样的,因为
    匹配的数据有193行,但
    out\u df
    只有143行。 这边

    out_df = pd.DataFrame()
    dfs = {}
    for match_number in range(len(matches_data)):
        for player_dict in matches_data.players[match_number]:
            if isinstance(player_dict, dict):
                df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
                dfs.update({match_number: df})
    for name, df in dfs.items():
        out_df = out_df.append(df[df.account_id.isin([34505203])], ignore_index=True)
    
    我得到的更少——138行。如何在这些巢穴中正确搜索所需的玩家?

    我会尝试:

    • 将您的函数简化为下面的内容,这满足了您的目标,即在所有比赛中生成给定球员编译的统计数据的
      DataFrame
    • 当从
      DataFrame
      中嵌套的
      dict
      中提取数据时,字典逻辑有助于减轻一些索引复杂性,因此此函数将
      DataFrame
      作为\u df
    中的
    ,但使用
    DataFrame.to\u dict()
    方法将其更改为
    dict
    代码:

    输出:

             champ     cs
    match1     Zoe  700.0
    match2  Syndra  800.0
    

    我想出来了。有两个问题:

  • Pandas在尝试从每个玩家的
    dict
    中生成
    数据帧时引发了一个错误(存储在“玩家”列
    系列
    中的
    列表中的十个
    索引
    ),因为所有玩家的
    dict
    都具有不同数组长度的值。例如,“match_id”值只是一个数字,而“ability_upgrades_arr”值是许多数字的列表。这很容易通过将值放入列表中来解决,本质上使每个值的数组长度等于1

  • 当你在比赛中反复寻找球员
    dicts
    ,有时他们不在那里(检查数据)。因此熊猫正试图用
    str
    float
    np.nan
    )制作
    DataFrame
    )。使用内置Python函数
    isinstance()
    修复此问题

  • 这是我的代码版本,它只是将每个播放机的
    dict
    作为
    DataFrame
    获取为任意键的值,正如您在上面尝试的那样。从这里开始,只需通过迭代进入
    dict
    ,查找一个玩家姓名的数据

    import requests
    import pandas as pd
    
    
    def get_team_matches_from_api(team: str) -> pd.DataFrame:
        "Get data from all matches played by :arg:`team`"
        
        # First, pull all teams from OpenDOTA so that we can...
        teams = pd.DataFrame(requests.get('https://api.opendota.com/api/teams').json())
    
        # ...get Team ID of :arg:`team`
        team_id = int(teams.team_id[teams.name.str.lower() == team.lower()])
    
        # Second, pull all games played by :arg:`team` so that we can...
        team_matches = requests.get(f'https://api.opendota.com/api/teams/{team_id}/matches').json()
    
        # ...get match IDs for each match played
        match_ids = [team_match['match_id'] for team_match in team_matches]
    
        # Third, go back to OpenDOTA and pull each match played by :arg:`team` using our match IDs above
        matches_data = []
        for match_id in match_ids:
            matches_data.append(requests.get(f'http://api.opendota.com/api/matches/{match_id}').json())
    
        # Fourth, put the data from the pulled matches into a `DataFrame`
        columns = ['match_id', 'duration', 'radiant_score', 'dire_score', 'radiant_gold_adv', 'radiant_xp_adv', 'radiant_team', 'dire_team', 'players', 'league', 'patch', 'start_time']
        df = pd.DataFrame(matches_data)[columns]
    
        return df.fillna('N/A')
    
    
    def get_player_data_from_team_matches(team_matches: pd.DataFrame) -> dict:
        "Pull player data from all games in :arg:`team_matches`"
        
        players_from_team_matches = dict()
    
        for match_number in range(team_matches.players.shape[0]):
    
            for player_dict in team_matches.players[match_number]:
                if isinstance(player_dict, dict):
                    df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
                    players_from_team_matches.update({match_number: df})
    
        return players_from_team_matches
    
    
    # DataFrame of all Nigma matches
    nigma = get_team_matches_from_api('nigma')
    
    # Dictionary of all player data from every Nigma match
    nigma_players = get_player_data_from_team_matches(nigma)
    

    非常感谢您的投入!我应该添加数据的外观。但是你知道为什么我的方法不起作用吗?嗯,我最好的猜测是熊猫们对你在df中使用
    .loc[]
    方法感到难过。您得到了
    ValueError:DataFrame构造函数未正确调用你的
    dict
    理解中的异常。我将查看,看看您是否可以重新表述
    DataFrame
    结构。作为参考,我能够使用我上面给出的代码成功地使用
    DataFrame
    construction
    dict
    comp:
    dfs={I:pd.DataFrame(faker.loc['match1'])来表示范围(5)}
    。实际上,我今天刚刚回到这里,它工作了。我的意思是没有任何变化。我还应该提到,它以前曾在某个时候起过作用,而且形式也完全相同。因此,我完全不知道发生了什么,事实上,事情并不顺利,所以我编辑了这个问题,并提供了更多的细节和步骤,以重现我的想法,谢谢你!将最后一次编辑添加到问题中-不知何故,我们在过程中丢失了行
    import pandas as pd
    
    def get_player_stats(in_df, player_id):
        
        df = pd.DataFrame()
    
        for match, players in in_df.to_dict()['players'].items():
            # {match1: {players}}
    
            for player, info in players.items():
                # {player1: {info}}
    
                if info['account_id'] == player_id:
                    # {player1: {'account_id': player_id}}
    
                    df = df.append(pd.Series(data=info, name=match))
    
        cols_to_keep = [col for col in df.columns if col != 'account_id']
    
        return df[cols_to_keep]
    
    # I assume your data looks something like this:
    matches_2020 = {
    
        'date': {
            'match1': '2020-06-01',
            'match2': '2020-06-02'
        },
        'players': {
            'match1': {
                'player1': {'account_id': 'FAKER', 'cs': 700, 'champ': 'Zoe'},
                'player2': {'account_id': 'BJERGSON', 'cs': 500, 'champ': 'Talon'}
            },
            'match2': {
                'player1': {'account_id': 'FAKER', 'cs': 800, 'champ': 'Syndra'},
                'player2': {'account_id': 'REDMERCY', 'cs': 500, 'champ': 'Zed'}
            }
        }
    }
    
    in_df = pd.DataFrame(matches_2020)
    
    # Let's pull Faker's stats:
    faker = get_player_stats(in_df, 'FAKER')
    print(faker)
    
             champ     cs
    match1     Zoe  700.0
    match2  Syndra  800.0
    
    import requests
    import pandas as pd
    
    
    def get_team_matches_from_api(team: str) -> pd.DataFrame:
        "Get data from all matches played by :arg:`team`"
        
        # First, pull all teams from OpenDOTA so that we can...
        teams = pd.DataFrame(requests.get('https://api.opendota.com/api/teams').json())
    
        # ...get Team ID of :arg:`team`
        team_id = int(teams.team_id[teams.name.str.lower() == team.lower()])
    
        # Second, pull all games played by :arg:`team` so that we can...
        team_matches = requests.get(f'https://api.opendota.com/api/teams/{team_id}/matches').json()
    
        # ...get match IDs for each match played
        match_ids = [team_match['match_id'] for team_match in team_matches]
    
        # Third, go back to OpenDOTA and pull each match played by :arg:`team` using our match IDs above
        matches_data = []
        for match_id in match_ids:
            matches_data.append(requests.get(f'http://api.opendota.com/api/matches/{match_id}').json())
    
        # Fourth, put the data from the pulled matches into a `DataFrame`
        columns = ['match_id', 'duration', 'radiant_score', 'dire_score', 'radiant_gold_adv', 'radiant_xp_adv', 'radiant_team', 'dire_team', 'players', 'league', 'patch', 'start_time']
        df = pd.DataFrame(matches_data)[columns]
    
        return df.fillna('N/A')
    
    
    def get_player_data_from_team_matches(team_matches: pd.DataFrame) -> dict:
        "Pull player data from all games in :arg:`team_matches`"
        
        players_from_team_matches = dict()
    
        for match_number in range(team_matches.players.shape[0]):
    
            for player_dict in team_matches.players[match_number]:
                if isinstance(player_dict, dict):
                    df = pd.DataFrame({key: [value] for key, value in player_dict.items()})
                    players_from_team_matches.update({match_number: df})
    
        return players_from_team_matches
    
    
    # DataFrame of all Nigma matches
    nigma = get_team_matches_from_api('nigma')
    
    # Dictionary of all player data from every Nigma match
    nigma_players = get_player_data_from_team_matches(nigma)