Python 通过条件过滤进行分组_Python_Pandas_Group By

Python 通过条件过滤进行分组

python pandas

Python 通过条件过滤进行分组,python,pandas,group-by,Python,Pandas,Group By,我有一个数据帧： import pandas as pd df = pd.DataFrame({'First': ['Sam', 'Greg', 'Steve', 'Sam', 'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'], 'Last': ['Stevens', 'Hamcunning', 'Strange', 'Stevens',

我有一个数据帧：

import pandas as pd

df = pd.DataFrame({'First': ['Sam', 'Greg', 'Steve', 'Sam',
                             'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
                   'Last': ['Stevens', 'Hamcunning', 'Strange', 'Stevens',
                            'Vargas', 'Simon', 'Purple', 'Green', 'Simon', 'Simon'],
                   'Address': ['112 Fake St',
                               '13 Crest St',
                               '14 Main St',
                               '112 Fake St',
                               '2 Morningwood',
                               '7 Cotton Dr',
                               '14 Main St',
                               '20 Main St',
                               '7 Cotton Dr',
                               '7 Cotton Dr'],
                   'Status': ['Infected', '', 'Infected', '', '', '', '','', '', 'Infected'],
                   })

我按代码应用以下分组

df_index = df.groupby(['Address', 'Last']).filter(lambda x: (x['Status'] == 'Infected').any()).index
df.loc[df_index, 'Status'] = 'Infected'

而不是像在分组中那样通过代码将所有内容标记为“已感染”。是否有方法选择要更新的值，以便将其标记为其他值？例如：

df2 = df.copy(deep=True)
df2['Status'] = ['Infected', '', 'Infected', 'Infected2', '', 'Infected2', '', '', 'Infected2', 'Infected']

我认为这达到了你想要的结果，但做法略有不同：

def infect_new_people(group):
    if (group['Status'] == 'Infected').any():
        # Only affect people not already infected
        group.loc[group['Status'] != 'Infected', 'Status'] = 'Infected2'
    return group['Status']

# Need group_keys=False so that each group has the same index
#   as the original dataframe
df['Status'] = df.groupby(['Address', 'Last'], group_keys=False).apply(infect_new_people)

df
Out[36]: 
         Address    First        Last     Status
0    112 Fake St      Sam     Stevens   Infected
1    13 Crest St     Greg  Hamcunning           
2     14 Main St    Steve     Strange   Infected
3    112 Fake St      Sam     Stevens  Infected2
4  2 Morningwood     Jill      Vargas           
5    7 Cotton Dr     Bill       Simon  Infected2
6     14 Main St      Nod      Purple           
7     20 Main St  Mallory       Green           
8    7 Cotton Dr     Ping       Simon  Infected2
9    7 Cotton Dr    Lamar       Simon   Infected

很抱歉，您期望的输出是什么，是不是

df2['Status']

？@JohnGalt

df2['Status']=['Infected'、''Infected'、''Infected2'、'''Infected2'、''，'Infected2'，'Infected']

有没有方法在没有该函数的情况下执行此操作吗？