Python 在两列匹配嵌套列表值的位置放置熊猫_Python_Pandas

Python 在两列匹配嵌套列表值的位置放置熊猫

python pandas

Python 在两列匹配嵌套列表值的位置放置熊猫,python,pandas,Python,Pandas,我有一个数据框，如果嵌套列表中的任何组合都满足，我就需要删除它以下是示例数据框： df = pd.DataFrame([['A','Green',10],['A','Red',20],['B','Blue',5],['B','Red',15],['C','Orange',25]],columns = ['Letter','Color','Value']) print df Letter Color Value 0 A Green 10 1 A

我有一个数据框，如果嵌套列表中的任何组合都满足，我就需要删除它
以下是示例数据框：

df = pd.DataFrame([['A','Green',10],['A','Red',20],['B','Blue',5],['B','Red',15],['C','Orange',25]],columns = ['Letter','Color','Value'])

print df

  Letter   Color  Value
0      A   Green     10
1      A     Red     20
2      B    Blue      5
3      B     Red     15
4      C  Orange     25

dropList = [['A','Green'],['B','Red']]

我有一个需要从数据框中删除的字母/颜色组合列表：

df = pd.DataFrame([['A','Green',10],['A','Red',20],['B','Blue',5],['B','Red',15],['C','Orange',25]],columns = ['Letter','Color','Value'])

print df

  Letter   Color  Value
0      A   Green     10
1      A     Red     20
2      B    Blue      5
3      B     Red     15
4      C  Orange     25

dropList = [['A','Green'],['B','Red']]

如何从任何嵌套列表中的字母/颜色组合所在的数据框中删除？
必要时我可以采取的方法，但要避免：

编写一个.apply函数

任何形式的蛮力迭代

将水滴列表转换为df并合并

#df_out = code here to drop if letter/color combo appears in my droplist
print df_out

  Letter   Color  Value
0      A     Red     20
1      B    Blue      5
2      C  Orange     25

我想有一些简单的一两行解决方案，我就是看不到…谢谢

您可以创建一个帮助器DF：

In [36]: drp = pd.DataFrame(dropList, columns=['Letter','Color'])

将（左）主DF与辅助DF合并，并仅选择右DF中缺少的行：

In [37]: df.merge(drp, how='left', indicator=True) \
           .query("_merge=='left_only'") \
           .drop('_merge',1)
Out[37]:
  Letter   Color  Value
1      A     Red     20
2      B    Blue      5
4      C  Orange     25

对您在

dropList

中使用的列进行多索引应该可以实现您的目标。从全套多索引元素中减去要删除的元素，然后用余数对数据帧进行切片

请注意，

dropList

的元素需要是用于查找的元组

dropSet = {tuple(elem) for elem in dropList}

# Creates a multi-index on letter/colour.
temp = df.set_index(['Letter', 'Color'])
# Keep all elements of the index except those in droplist.
temp = temp.loc[list(set(temp.index) - dropSet)]
# Reset index to get the original column layout.
df_dropped = temp.reset_index()

这将返回：

In [4]: df_dropped
Out[4]: 
  Letter   Color  Value
0      B    Blue      5
1      A     Red     20
2      C  Orange     25

您可以使用字母颜色组合和水滴列表之间的差异来重新索引DF

result = (
    df.set_index(['Letter','Color'])
    .pipe(lambda x: x.reindex(x.index.difference(dropList)))
    .reset_index()
    )

result
Out[45]: 
  Letter   Color  Value
0      A     Red     20
1      B    Blue      5
2      C  Orange     25

下面是对isin（）的疯狂使用，尽管我的第一选择是@MaxU的解决方案

new_df = df[~df[['Letter', 'Color']].apply(','.join,axis = 1).isin([s[0]+','+s[1] for s in dropList])]

    Letter  Color   Value
1   A       Red     20
2   B       Blue    5
4   C       Orange  25

这篇文章的灵感来自于后面的一个问题，请在那里投票

df2 = pd.DataFrame(dropList, columns=['Letter', 'Color'])
df.loc[~df.index.isin(df.merge(df2.assign(a='key'), how='left').dropna().index)]

将列表列表转换为字典

mapper = dict(dropList)

现在，通过将字典映射到数据帧进行过滤

df[df.Letter.map(mapper) != df.Color]

屈服

 Letter   Color  Value
1      A     Red     20
2      B    Blue      5
4      C  Orange     25

哈，我在上面的no-no列表中有第3项（将dropList转换为df并合并），但是我想如果这是最好的方式，我会选择它。我在想可能会有一些索引/解包列表魔术，不需要合并。是的。这是屡获殊荣的模糊代码。给人印象深刻的