Python 列值仍显示在.isin（）之后_Python_Python 3.x_Pandas_Dataframe_Isin

Python 列值仍显示在.isin（）之后

python python-3.x pandas dataframe

Python 列值仍显示在.isin（）之后,python,python-3.x,pandas,dataframe,isin,Python,Python 3.x,Pandas,Dataframe,Isin,根据要求，下面是一个最小的可复制示例，它将生成.isin（）问题，即不删除不在.isin（）中的值，而只是将其设置为零： import os import pandas as pd df_example = pd.DataFrame({'Requesting as': {0: 'Employee', 1: 'Ex- Employee', 2: 'Employee', 3: 'Employee', 4: 'Ex-Employee', 5: 'Employee', 6: 'Employe

根据要求，下面是一个最小的可复制示例，它将生成.isin（）问题，即不删除不在.isin（）中的值，而只是将其设置为零：

import os
import pandas as pd

df_example = pd.DataFrame({'Requesting as': {0: 'Employee', 1: 'Ex-      Employee', 2: 'Employee', 3: 'Employee', 4: 'Ex-Employee', 5: 'Employee', 6: 'Employee', 7: 'Employee', 8: 'Ex-Employee', 9: 'Ex-Employee', 10: 'Employee', 11: 'Employee', 12: 'Ex-Employee', 13: 'Ex-Employee', 14: 'Employee', 15: 'Employee', 16: 'Employee', 17: 'Ex-Employee', 18: 'Employee', 19: 'Employee', 20: 'Ex-Employee', 21: 'Employee', 22: 'Employee', 23: 'Ex-Employee', 24: 'Employee', 25: 'Employee', 26: 'Ex-Employee', 27: 'Employee', 28: 'Employee', 29: 'Ex-Employee', 30: 'Employee', 31: 'Employee', 32: 'Ex-Employee', 33: 'Employee', 34: 'Employee', 35: 'Ex-Employee', 36: 'Employee', 37: 'Employee', 38: 'Ex-Employee', 39: 'Employee', 40: 'Employee'}, 'Years of service': {0: -0.4, 1: -0.3, 2: -0.2, 3: 1.0, 4: 1.0, 5: 1.0, 6: 2.0, 7: 2.0, 8: 2.0, 9: 2.0, 10: 3.0, 11: 3.0, 12: 3.0, 13: 4.0, 14: 4.0, 15: 4.0, 16: 5.0, 17: 5.0, 18: 5.0, 19: 5.0, 20: 6.0, 21: 6.0, 22: 6.0, 23: 11.0, 24: 11.0, 25: 11.0, 26: 16.0, 27: 17.0, 28: 18.0, 29: 21.0, 30: 22.0, 31: 23.0, 32: 26.0, 33: 27.0, 34: 28.0, 35: 31.0, 36: 32.0, 37: 33.0, 38: 35.0, 39: 36.0, 40: 37.0}, 'yos_bins': {0: 0, 1: 0, 2: 0, 3: '0-1', 4: '0-1', 5: '0-1', 6: '1-2', 7: '1-2', 8: '1-2', 9: '1-2', 10: '2-3', 11: '2-3', 12: '2-3', 13: '3-4', 14: '3-4', 15: '3-4', 16: '4-5', 17: '4-5', 18: '4-5', 19: '4-5', 20: '5-6', 21: '5-6', 22: '5-6', 23: '10-15', 24: '10-15', 25: '10-15', 26: '15-20', 27: '15-20', 28: '15-20', 29: '20-40', 30: '20-40', 31: '20-40', 32: '20-40', 33: '20-40', 34: '20-40', 35: '20-40', 36: '20-40', 37: '20-40', 38: '20-40', 39: '20-40', 40: '20-40'}})


cut_labels = ['0-1','1-2', '2-3', '3-4', '4-5', '5-6', '6-10', '10-15', '15-20', '20-40']
cut_bins = (0, 1, 2, 3, 4, 5, 6, 10, 15, 20, 40)
df_example['yos_bins'] = pd.cut(df_example['Years of service'], bins=cut_bins, labels=cut_labels)

print(df_example['yos_bins'].value_counts())
print(len(df_example['yos_bins']))
print(len(df_example))
print(df_example['yos_bins'].value_counts())

test = df_example[df_example['yos_bins'].isin(['0-1', '1-2', '2-3'])]
print('test dataframe:\n',test)
print('\n')
print('test value counts of yos_bins:\n',     test['yos_bins'].value_counts())
print('\n')
dic_test = test.to_dict()
print(dic_test)
print('\n')
print(test.value_counts())ervr

我为“服务年限”专栏创建了垃圾箱：

然后我将.isin（）应用于名为“yos_bins”的数据帧列，以筛选列值的选择。摘自df专栏

我用来切片的列称为“yos_bins”（即已分类的服务年限）。我只想选择3个范围（0-1年、1-2年、2-3年），但显然在列中包含了更多的范围

令我惊讶的是，当我应用value_counts（）时，我仍然从df数据帧获取yos_bins列的所有值（但计数为0）

看起来像这样：

这不是故意的，除了isin（）中的3之外，所有其他箱子都应该被丢弃。由此产生的问题是，0值显示在sns.countplot中，因此我最终得到了不希望出现的计数为零的列

当我将df保存到_excel（）时，所有“10-15”值字段都显示一个“带两位数年份的文本日期”错误。我没有将该数据帧加载回python，因此不确定这是否会导致问题

有人知道我如何创建只包含3个yos_bins值的测试数据帧，而不是显示所有yos_bins值，但其中一些值为零吗？

这是一个丑陋的解决方案，因为numpy和pandas在元素方面的“is in”特征不符。根据我的经验，我使用numpy数组手动进行比较

yos_bins = np.array(df["yos_bins"])
yos_bins_sel = np.array(["0-1", "1-2", "2-3"])
mask = (yos_bins[:, None] == yos_bins_sel[None, :]).any(1)
df[mask]
   Requesting as  Years of service yos_bins
3       Employee               1.0      0-1
4    Ex-Employee               1.0      0-1
5       Employee               1.0      0-1
6       Employee               2.0      1-2
7       Employee               2.0      1-2
8    Ex-Employee               2.0      1-2
9    Ex-Employee               2.0      1-2
10      Employee               3.0      2-3
11      Employee               3.0      2-3
12   Ex-Employee               3.0      2-3

解释（使用x作为yos_bins，使用y作为yos_bins_sel）

x[：，None]==y[None，：]）。all（1）

是主要外卖，

x[：，None]

将x从形状转换为（n，）到（n，1）<代码>y[无：]将y从形状（m，）转换为（1，m）。将它们与

==

进行比较，形成一个形状（n，m）的广播元素布尔数组，我们希望我们的数组是（n，）-形的，因此我们应用

.any（1）

，以便第二维度压缩为

真

，如果它的至少一个布尔值是

真

（如果元素在yos_bins_sel数组中）。最后是一个布尔数组，可用于屏蔽原始数据帧。将x替换为包含要比较的值的数组，将y替换为包含x值的数组，您可以对任何数据集执行此操作。

您确定创建子集的是

test=…

行吗？你能创建一个有同样问题的例子吗？最后添加了可复制的例子谢谢你，迈克。但是，当我使用df_new=df[mask]print（df_new.yos_bins.value_counts（））扩展代码时，它会显示所有10个箱子，而不仅仅是您选择的三个。我不明白为什么它没有显示3个选定的箱子，而是显示另外7个带零的箱子。我希望其他的都消失。这是因为

yos_-bins

仍然保留原始数组的数据类型，以便在它们之间进行平滑操作，并且原始数据类型是一个包含所有yos_-bins类别的分类数据类型。要使

yos_bin

拥有自己的数据类型do

df_new[“yos_bin”]=df_new[“yos_bin”].astype（yos_bins_sel）

。注意：这会产生一个警告，尽管我认为它不应该出现，因为即使使用

。loc

也不会停止它，但您可以抑制它；在这里阅读更多关于它的信息

test.yos_bins.value_counts()

yos_bins = np.array(df["yos_bins"])
yos_bins_sel = np.array(["0-1", "1-2", "2-3"])
mask = (yos_bins[:, None] == yos_bins_sel[None, :]).any(1)
df[mask]
   Requesting as  Years of service yos_bins
3       Employee               1.0      0-1
4    Ex-Employee               1.0      0-1
5       Employee               1.0      0-1
6       Employee               2.0      1-2
7       Employee               2.0      1-2
8    Ex-Employee               2.0      1-2
9    Ex-Employee               2.0      1-2
10      Employee               3.0      2-3
11      Employee               3.0      2-3
12   Ex-Employee               3.0      2-3