Python 绘制熊猫的某些数据点

Python 绘制熊猫的某些数据点,python,pandas,Python,Pandas,我正试图建立一个处理棒球统计数据的程序。我要求用户输入一个团队,然后代码通过我创建的panda运行,搜索与用户输入匹配的“teamID” 我尝试过按“teamID”分组,但在for循环之前使用了索引和索引 def AttendancePlot(teams,team_pick): fig, ax = plt.subplots() group_by_teamID = teams.groupby(by=['teamID']) print group_by_teamID

我正试图建立一个处理棒球统计数据的程序。我要求用户输入一个团队,然后代码通过我创建的panda运行,搜索与用户输入匹配的“teamID”

我尝试过按“teamID”分组,但在for循环之前使用了索引和索引

def AttendancePlot(teams,team_pick):

    fig, ax = plt.subplots()
    group_by_teamID = teams.groupby(by=['teamID'])
    print group_by_teamID

    for i in group_by_teamID.index:
        if i == team_pick:
            ax.scatter(teams['yearID'][i], teams['attendance'][i], color="#4DDB94", s=200)
            ax.annotate(i, (teams['yearID'][i], teams['attendance'][i]),
               bbox=dict(boxstyle="round", color="#4DDB94"),
               xytext=(-30, 30), textcoords='offset points',
               arrowprops=dict(arrowstyle="->", connectionstyle="angle,angleA=0,angleB=90,rad=10"))
我是如何创造熊猫的

teams = pd.read_csv('Teams.csv')
salaries = pd.read_csv('Salaries.csv')
names = pd.read_csv('Names.csv')

teams = teams[teams['yearID'] >= 1985]
teams = teams[['yearID', 'teamID', 'Rank', 'R', 'RA', 'G', 'W', 'H', 'BB',    'HBP', 'AB', 'SF', 'HR', '2B', '3B', 'attendance']]
teams = teams.set_index(['yearID', 'teamID'])

salaries_by_yearID_teamID = salaries.groupby(['yearID', 'teamID'])  ['salary'].sum()
teams = teams.join(salaries_by_yearID_teamID)

print teams.head(15)
输出熊猫

          Rank    R   RA    G     ...       2B  3B  attendance      salary
yearID teamID                          ...                                     
1985   ATL        5  632  781  162     ...      213  28   1350137.0   14807000.0
       BAL        4  818  764  161     ...      234  22   2132387.0  11560712.0
       BOS        5  800  720  163     ...      292  31   1786633.0  10897560.0
       CAL        2  732  703  162     ...      215  31   2567427.0  14427894.0

我想要一个散点图,显示某个输入团队的年度出勤率。我得到的是一个没有错误的空白图形。

无需使用
groupby()
这里,
groupby()
通常用于对选定的行应用一些数学运算。您需要的是正确选择数据

此函数将绘制给定团队的年度(x轴)与出勤率(y轴)
team_pick
,假设您描述的数据帧结构(数据帧是
团队
):

我把注释留给你

关键是这一行:
teamdata=teams.loc[teams.index.get\u level\u value('teamID')==team\u pick]

teams.index.get_level_values('teamID')==team_pick
对多行索引执行选择,允许您选择团队所在的所有行
team_pick

因此,
teamdata
是一个包含给定团队所有行的数据框


这就是所谓的。另请参见。

您是否可以添加数据框架的示例?teams=pd.read_csv('teams.csv')palary=pd.read_csv('salary.csv')names=pd.read_csv('names.csv')teams=teams[teams['yearID']>=1985]teams=teams['yearID','teamID','Rank','R','RA','G','W','H','H','BB HBP','AB','SF'HR','2B','attentication]teams=teams.set_index(['yearID','teamID'])palary_by_yearID_teamID=palars.groupby(['yearID','teamID']))['salary'].sum()teams.join(palary_by_yearID_teamID')print teams.head(15)此代码输出如下列表…Rank R G。。。2B 3B考勤工资年ID团队ID。。。1985 ATL 5632781162。。。213 28 1350137.0 14807000.0 BAL 4 818 764 161。。。234 22 2132387.0 11560712.0 BOS 5 800 720 163。。。292311786633.0 10897560.0 CAL2732703162。。。215312567427.014427894.0CHA3736720163。。。247 37 1669888.0 9846178.0请将其添加到问题中:单击按钮编辑您的问题。将更具可读性。@Valentino刚刚编辑,对此表示抱歉!非常感谢你!我试图更好地理解python中数据可视化背后的逻辑,这让我有点困惑。你是最棒的!
def AttendancePlot(teams, team_pick):
    teamdata = teams.loc[teams.index.get_level_values('teamID') == team_pick]
    plt.scatter(teamdata.index.levels[0], teamdata['attendance'])
    plt.show()