Python Pandas-基于其他dataframe列中的值删除列_Python_Pandas

Python Pandas-基于其他dataframe列中的值删除列

python pandas

Python Pandas-基于其他dataframe列中的值删除列,python,pandas,Python,Pandas,我在pandas中有一个名为df_a的数据帧，它实时有100多列另外，我还有另一个数据框dfu B，其中两列给出了我需要从df_A 下面给出了一个可复制的示例 import pandas as pd d = {'foo':[100, 111, 222], 'bar':[333, 444, 555],'foo2':[110, 101, 222], 'bar2':[333, 444, 555],'foo3':[100, 111, 222], 'bar3':[3

我在pandas中有一个名为

df_a

的数据帧，它实时有100多列

另外，我还有另一个数据框

dfu B

，其中两列给出了我需要从

df_A

下面给出了一个可复制的示例

import pandas as pd

d = {'foo':[100, 111, 222], 
     'bar':[333, 444, 555],'foo2':[110, 101, 222], 
     'bar2':[333, 444, 555],'foo3':[100, 111, 222], 
     'bar3':[333, 444, 555]}

df_A = pd.DataFrame(d)

d = {'ReqCol_A':['foo','foo2'], 
     'bar':[333, 444],'foo2':[100, 111], 
     'bar2':[333, 444],'ReqCol_B':['bar3', ''], 
     'bar3':[333, 444]}

df_b = pd.DataFrame(d)

如上面示例中所示，

df_b

，

ReqCol_A

和

ReqCol_b

下的值是我试图从

df_A

中获得的值

因此，我的预期输出将有

df_A

中的三列。这三列分别是foo foo2和bar3

df_C

将是预期的输出，它看起来像

df_C
foo foo2 bar3
100 110  333
111 101  444
222 222  555

请帮我做这个。我正在努力实现这一点。

解决方案：

# retrieve all the unique elements from your df_b columns (ReqCol_A and ReqCol_B) let it also include nan and other unwanted features
features = set(df_b.ReqCol_A.unique()) | set(df_b.ReqCol_B.unique())

# Taking intersection with df_A column names and fetching the names which need to be targeted
target_features = set(df_A.columns) & features

# Get the Output
df_A.loc[:,target_features]

性能比较

给定方法：

%%timeit
features = set(df_b.ReqCol_A.unique()) | set(df_b.ReqCol_B.unique())
target_features = set(df_A.columns) & features
df_A.loc[:,target_features]
875 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

第二个答案（使用过滤器）：

显然，给定的方法比其他方法快得多。

解决方案：

# retrieve all the unique elements from your df_b columns (ReqCol_A and ReqCol_B) let it also include nan and other unwanted features
features = set(df_b.ReqCol_A.unique()) | set(df_b.ReqCol_B.unique())

# Taking intersection with df_A column names and fetching the names which need to be targeted
target_features = set(df_A.columns) & features

# Get the Output
df_A.loc[:,target_features]

性能比较

给定方法：

%%timeit
features = set(df_b.ReqCol_A.unique()) | set(df_b.ReqCol_B.unique())
target_features = set(df_A.columns) & features
df_A.loc[:,target_features]
875 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

第二个答案（使用过滤器）：

显然，给定的方法比其他方法快得多。

尝试使用

过滤器

仅获取带有“ReqCol”的列，然后

堆栈

获取列表并过滤db_a数据帧：

df_A[df_b.filter(like='ReqCol').replace('', np.nan).stack().tolist()]

输出：

   foo  bar3  foo2
0  100   333   100
1  111   444   111
2  222   555   222

尝试使用

filter

仅获取带有'ReqCol'的列，然后使用

stack

获取列表并过滤db_a数据帧：

df_A[df_b.filter(like='ReqCol').replace('', np.nan).stack().tolist()]

输出：

   foo  bar3  foo2
0  100   333   100
1  111   444   111
2  222   555   222

我有点不明白你在追求什么。你能提供你期望的结果吗？@busybear现在加入了这个问题。谢谢你。我对你要找的东西有点迷茫。你能提供你期望的结果吗？@busybear现在加入了这个问题。谢谢。如果你有超过10个“ReqCol__x”手动输入或使用过滤器怎么办？@ScottBoston在这种情况下，我会遵循

过滤器

，但在给定的约束条件下，我认为不需要过滤器。如果你有超过10个“ReqCol___x”手动输入或使用过滤器怎么办？@ScottBoston在这种情况下，我确实遵循了

filter

，但是在给定的约束条件下，我认为没有必要使用filter。