Python 如何根据多个条件在pandas中创建子集

Python 如何根据多个条件在pandas中创建子集,python,pandas,Python,Pandas,例: 我需要单独的第五行,因为它没有任何付款, -首先对与同一发票相关的所有行进行分组。根据发票是否已付款,组合状态将有所不同: Cat INVOICE_REF_NUMBER OPEN_ITEM_AMOUNT(Netted Amt) AMOUNT_ COLLECTED(Original Amt) COMPANY_CODE OPERATING_UNIT count invoice 0992541158 115606.38 578031.91 4380 6238 2

例:

我需要单独的第五行,因为它没有任何付款,
-

首先对与同一发票相关的所有行进行分组。根据发票是否已付款,组合状态将有所不同:

Cat INVOICE_REF_NUMBER  OPEN_ITEM_AMOUNT(Netted Amt)    AMOUNT_ COLLECTED(Original Amt) COMPANY_CODE    OPERATING_UNIT count
invoice 0992541158  115606.38   578031.91   4380    6238   2
payment 0992541158  0          -462425.53   4380    6238   2
invoice 0090010917  1519         87803.4    2700    4315   2
payment 0090010917  0           -86284.4    2700    4315   2
invoice 0090007022  2039.55      13517      2700    4315   2
现在,提取未付款发票的原始行:

status = df.groupby("INVOICE_REF_NUMBER")['Cat'].sum()
#INVOICE_REF_NUMBER
#0090007022           invoice
#0090010917    invoicepayment
#0992541158    invoicepayment
#Name: Cat, dtype: object
如果需要,您可以删除重复的“Cat_”列:

unpayed = df.join(status[status=='invoice'], rsuffix='_', how='right', 
                  on='INVOICE_REF_NUMBER')
#       Cat INVOICE_REF_NUMBER  OPEN_ITEM_AMOUNT(Netted Amt)     Cat_
#4  invoice         0090007022                       2039.55  invoice

以下是我最大的努力:

del unpayed['Cat_']
#       Cat INVOICE_REF_NUMBER  OPEN_ITEM_AMOUNT(Netted Amt)
#4  invoice         0090007022                       2039.55

到目前为止,您尝试了什么?我在excel中通过基于“cat”的countifs完成了这项工作,并考虑了如果任何键都只有发票和付款,那么需要在python中实现。您可能需要解释您想要做什么。但是您总是可以这样做
df2=df1[df1['Column\u Name']=='Condition']
。对于多种情况,应使用
~
表示非
|
表示或,使用
&
表示和
# Assume nothing has a payment
df['payment_count'] = 0

# For each invoice, count the related payments by applying
# a lambda function on each row (hence the axis=1)
df.loc[df.Cat=='invoice', 'payment_count'] =     
    df.loc[df.Cat=='invoice'].apply(lambda x: \      
    df.loc[(df['INVOICE_REF_NUMBER']==x['INVOICE_REF_NUMBER']) \
    & df.Cat=='payment')], 'Cat').count(), axis=1)

# Filter on the invoices without payments
print((df[df.Cat=='invoice') & (df.payment_count==0)])