Python 删除其他列中仅与一个值相关的行

Python 删除其他列中仅与一个值相关的行,python,pandas,dataframe,Python,Pandas,Dataframe,假设我有这样的数据帧: item name gender banana tom male banana kate female apple kate female kiwi jim male apple tom male banana kimmy female kiwi kate female banana tom male item name ge

假设我有这样的数据帧:

item     name     gender
banana   tom      male
banana   kate     female
apple    kate     female
kiwi     jim      male
apple    tom      male
banana   kimmy    female
kiwi     kate     female
banana   tom      male
item     name     gender
banana   tom      male
banana   kate     female
apple    kate     female
apple    tom      male
kiwi     kate     female
banana   tom      male 
是否有任何方法可以删除仅关联(购买)少于2项的行?另外,我不想删除重复项。所以我想要的输出如下:

item     name     gender
banana   tom      male
banana   kate     female
apple    kate     female
kiwi     jim      male
apple    tom      male
banana   kimmy    female
kiwi     kate     female
banana   tom      male
item     name     gender
banana   tom      male
banana   kate     female
apple    kate     female
apple    tom      male
kiwi     kate     female
banana   tom      male 
@sammywemmy的解决方案:
df.loc[df.groupby('name').item.transform('size').ge(2)]

  • 将具有相同名称的行分组在一起
  • #获取每个组
    打印(df.groupby('name').apply(lambda s:s.reset_index()))
    
  • 在每行中获取表示组大小的值。(行数)
  • #将每个项目转换为组中的行数
    df['group_size']=df.groupby('name')['item'].transform('size'))
    打印(df)
    
    在这种情况下,可以在任何列上执行此操作:

    # Turn Each Item Into The Number of Rows in The Group
    df['group_size'] = df.groupby('name')['gender'].transform('size')
    print(df)
    
    请注意,现在每行的末尾都有相应的组大小
    tom
    有3个实例,因此每个
    name==tom
    行在
    group\u size
    中有3个实例

  • 基于关系算子的布尔索引转换
  • #添加条件以确定是否应保留该行
    df['should_keep']=df.groupby('name')['item'].transform('size').ge(2)
    打印(df)
    
  • 使用布尔索引获取所需的行
  • print(df.groupby('name')['item'].transform('size').ge(2))
    
    loc
    将包括
    True
    的任何索引,将排除
    False
    的任何索引。(索引3和5为
    False
    ,因此不包括它们)


    总而言之:

    将熊猫作为pd导入
    df=pd.DataFrame({'item':{0:'香蕉',1:'香蕉',2:'苹果',
    3:'猕猴桃',4:'苹果',5:'香蕉',
    6:'猕猴桃',7:'香蕉'},
    'name':{0:'汤姆',1:'凯特',2:'凯特',
    3:'吉姆',4:'汤姆',5:'吉米',
    6:‘凯特’,7:‘汤姆’,
    '性别':{0:'男性',1:'女性',
    2:'女性',3:'男性',
    4:'男性',5:'女性',
    6:'女性',7:'男性'})
    打印(df.loc[df.groupby('name')['name'].transform('size').ge(2)])
    

    df.loc[df.groupby('name').item.transform('size').ge(2)]
    ?谢谢你,它能工作!你能回答这个问题吗?如果你能稍加解释就太好了。非常感谢!这对我来说更清楚了。
         item   name  gender  group_size
    0  banana    tom    male           3
    1  banana   kate  female           3
    2   apple   kate  female           3
    3    kiwi    jim    male           1
    4   apple    tom    male           3
    5  banana  kimmy  female           1
    6    kiwi   kate  female           3
    7  banana    tom    male           3
    
         item   name  gender  group_size  should_keep
    0  banana    tom    male           3         True
    1  banana   kate  female           3         True
    2   apple   kate  female           3         True
    3    kiwi    jim    male           1        False
    4   apple    tom    male           3         True
    5  banana  kimmy  female           1        False
    6    kiwi   kate  female           3         True
    7  banana    tom    male           3         True
    
    0     True
    1     True
    2     True
    3    False
    4     True
    5    False
    6     True
    7     True
    Name: item, dtype: bool
    
         item  name  gender
    0  banana   tom    male
    1  banana  kate  female
    2   apple  kate  female
    4   apple   tom    male
    6    kiwi  kate  female
    7  banana   tom    male