Python 在本例中，如何选择一列中有重复项但另一列中有不同值的行？_Python_Pandas_Dataframe

Python 在本例中，如何选择一列中有重复项但另一列中有不同值的行？

python pandas dataframe

Python 在本例中，如何选择一列中有重复项但另一列中有不同值的行？,python,pandas,dataframe,Python,Pandas,Dataframe,我有这样一个数据帧： import pandas as pd records = [{'Name':'John', 'Country':'Canada'}, {'Name':'John', 'Country':'Canada'}, {'Name':'Mary', 'Country':'US'}, {'Name':'Mary', 'Country':'Canada'}, {'Name':'Mary', 'Country':'US'}, {'Name':'Stan',

我有这样一个数据帧：

import pandas as pd
records = [{'Name':'John', 'Country':'Canada'}, {'Name':'John', 'Country':'Canada'}, 
       {'Name':'Mary', 'Country':'US'}, {'Name':'Mary', 'Country':'Canada'}, 
       {'Name':'Mary', 'Country':'US'}, {'Name':'Stan', 'Country':'UK'},
       {'Name':'Stan', 'Country':'UK'}]
df = pd.DataFrame(records)
df

我想测试具有不同国家值的名称。在这种情况下，我只想见玛丽，因为她在国家栏目中既有美国也有加拿大。我可以把约翰和斯坦排除在外，因为他们的记录都是同一个国家的

有没有办法做到这一点？

首先，您可以按

名称

列分组，然后将组

国家

列合并到列表中。然后检查此列表中的值是否都相同

此外，您可以使用布尔索引来选择具有不同值的行

s=df.groupby（'Name'）['Country'].agg（list）.apply（lambda l:all（map（lambda x:x==l[0]，l）））
df_u=df[df['Name'].isin（s[~s].index）].drop_duplicates（）

第一步是查找具有多个唯一

国家的名称，然后您可以在数据帧上使用loc
，仅过滤这些值
方法1:groupby
# groupby name and return a boolean of whether each has more than 1 unique Country
multi_country = df.groupby(["Name"]).Country.nunique().gt(1)

# use loc to only see those values that have `True` in `multi_country`:
df.loc[df.Name.isin(multi_country[multi_country].index)]

   Name Country
2  Mary      US
3  Mary  Canada
4  Mary      US

方法2:删除重复项
和值计数
# groupby name and return a boolean of whether each has more than 1 unique Country
multi_country = df.groupby(["Name"]).Country.nunique().gt(1)

# use loc to only see those values that have `True` in `multi_country`:
df.loc[df.Name.isin(multi_country[multi_country].index)]

   Name Country
2  Mary      US
3  Mary  Canada
4  Mary      US

您可以遵循相同的逻辑，但使用drop\u duplicates
和value\u counts
而不是groupby:
multi_country = df.drop_duplicates().Name.value_counts().gt(1)

df.loc[df.Name.isin(multi_country[multi_country].index)]

   Name Country
2  Mary      US
3  Mary  Canada
4  Mary      US

方法3:删除重复项
和重复项
# groupby name and return a boolean of whether each has more than 1 unique Country
multi_country = df.groupby(["Name"]).Country.nunique().gt(1)

# use loc to only see those values that have `True` in `multi_country`:
df.loc[df.Name.isin(multi_country[multi_country].index)]

   Name Country
2  Mary      US
3  Mary  Canada
4  Mary      US

注意：这将给出稍微不同的结果：您将只看到Mary的唯一值，这可能是您想要的，也可能不是您想要的
您可以删除原始帧中的重复项，并仅返回已删除重复项的帧中有多个条目的名称：
no_dups = df.drop_duplicates()

no_dups[no_dups.duplicated(keep = False, subset="Name")]

   Name Country
2  Mary      US
3  Mary  Canada

df.drop_duplicates（）
？