Python 仅返回满足where子句的数据帧列_Python_Pandas

Python 仅返回满足where子句的数据帧列

python pandas

Python 仅返回满足where子句的数据帧列,python,pandas,Python,Pandas,从任意数据帧开始，我想返回一个数据帧，其中只包含那些具有多个不同值的列我有： X = df.nunique() 比如：然后我将其从一个系列转换为一个数据帧： X = X.to_frame(name = 'dcount') counts = df.nunique() df = df[counts[counts > 1].index] 然后我使用where子句只返回大于1的值： X.where(X[['dcount']]>1) 这看起来像：

从任意数据帧开始，我想返回一个数据帧，其中只包含那些具有多个不同值的列

我有：

X = df.nunique()

比如：

然后我将其从一个系列转换为一个数据帧：

X = X.to_frame(name = 'dcount')

counts = df.nunique()
df = df[counts[counts > 1].index]

然后我使用where子句只返回大于1的值：

X.where(X[['dcount']]>1)

这看起来像：

                   dcount
    Id                5.0
    MSSubClass        3.0
    MSZoning          NaN
    LotFrontage       5.0
    LotArea           5.0
    Street            NaN
    Alley             NaN
    LotShape          2.0
    ...

但是我现在只需要那些列名称（在X的索引中）没有dcount='NaN'，这样我就可以最终返回到我的原始数据帧df并将其定义为：

df=df[[list_of_columns]]

如何做到这一点？我试过十几种方法，但都是空穴来风。我怀疑有一种方法可以在1或2行代码中实现这一点。

您可以使用布尔索引，避免将计数序列转换为数据帧：

X = X.to_frame(name = 'dcount')

counts = df.nunique()
df = df[counts[counts > 1].index]

关键是要注意，

计数

系列的索引是列标签。因此，您可以过滤序列，然后通过提取所需的索引

下面是一个演示：

df = pd.DataFrame({'A': [1, 1, 1], 'B': [1, 2, 3],
                   'C': [4, 5, 5], 'D': [0, 0, 0]})

counts = df.nunique()
df = df[counts[counts > 1].index]

print(df)

   B  C
0  1  4
1  2  5
2  3  5

极好的谢谢似乎我可以完全跳过“计数”（虽然一步一步很有帮助），并且有

x[x.nunique（）[x.nunique（）>1].index]

@ColinMac，我不建议这样做，因为这意味着要计算

x.nunique（）

（真正昂贵的部分）两次。