Python 使用For循环返回数据帧中的唯一值_Python_Pandas

Python 使用For循环返回数据帧中的唯一值

python pandas

Python 使用For循环返回数据帧中的唯一值,python,pandas,Python,Pandas,我知道Pandas并不是专门为for循环而设计的，但是我有一个特定的任务，我必须做很多次，如果我能用一个我可以调用的函数将其中的一部分提取出来，这将节省很多时间 my dataframe的通用版本如下所示： df = pd.DataFrame({'Name': pd.Categorical(['John Doe', 'Jane Doe', 'Bob Smith']), 'Score1': np.arange(3), 'Score2': np.arange(3, 6, 1)})

我知道Pandas并不是专门为for循环而设计的，但是我有一个特定的任务，我必须做很多次，如果我能用一个我可以调用的函数将其中的一部分提取出来，这将节省很多时间

my dataframe的通用版本如下所示：

df = pd.DataFrame({'Name': pd.Categorical(['John Doe', 'Jane Doe', 'Bob Smith']), 'Score1': np.arange(3), 'Score2': np.arange(3, 6, 1)})

        Name  Score1  Score2
0   John Doe       0       3
1   Jane Doe       1       4
2  Bob Smith       2       5

我想做的是采用以下方法：

df.loc[df.Name == 'Jane Doe', 'Score2']

它应该返回4，但使用for循环遍历它，如下所示：

def pull_score(people, score):    
    for i in people:
        print df.loc[df.Name == people[i], score]

如果我想，我可以打电话：

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
pull_score(the_names, 'Score2')

并获得：

3
4
5

我当前收到的错误消息是：

TypeError: list indices must be integers, not str

我已经查看了与此错误消息和熊猫相关的其他一些答案，例如：和此：

但是我没有看到这两种方法中的任何一种对我试图做的事情的答案，我不相信

iterrows（）

或

itertuple（）

会适用，因为我需要熊猫首先找到值。

您可以将名称设置为索引，然后使用

loc

按索引搜索：

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
df.set_index('Name').loc[the_names, 'Score2']

# Name
# John Doe     3
# Jane Doe     4
# Bob Smith    5
# Name: Score2, dtype: int32

实际上，您不需要循环，您可以这样做：

print(df.loc[df.Name == the_names, 'Score2'])
0    3
1    4
2    5
Name: Score2, dtype: int32

第一件事。您的逻辑中存在一个错误，即当您建立

for

循环时，您使用

人员

中的内容，就好像它们是列表

人员

的索引一样，而它们是

人员

中的内容。所以，你应该这样做

def pull_score(df, people, score):
    for i in people:
        print df.loc[df.Name == i, score]

the_names = ['John Doe', 'Jane Doe', 'Bob Smith']
pull_score(df, the_names, 'Score2')

0    3
Name: Score2, dtype: int64
1    4
Name: Score2, dtype: int64
2    5
Name: Score2, dtype: int64

既然已经说过了，我将跳上其他回答者的同一辆马车，说有更好的方法使用内置功能来实现这一点。下面是我尝试捕获每个解决方案在以提供解决方案的用户命名的函数中尝试执行的操作。我认为，pir
是最有效的，因为它使用的功能正是为了完成这项任务而设计的

def john(df, people, score):
    s = pd.Series([])
    for i in people:
        s = s.append(df.loc[df['Name'] == i, score])
    return s

def psidom(df, people, score):
    return df.set_index('Name').loc[people, score]

def pir(df, people, score):
    return df.loc[df['Name'].isin(people), score]

时机

这是不准确的。它只在规定的测试用例中起作用。尝试

df.loc[df.Name==_Name[：2]，'Score2']

失败！