Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 我如何实现np.where(df[varaible]in[';value1';,';value2';])这样的功能_Python_Pandas_Numpy_Series_Categorical Data - Fatal编程技术网

Python 我如何实现np.where(df[varaible]in[';value1';,';value2';])这样的功能

Python 我如何实现np.where(df[varaible]in[';value1';,';value2';])这样的功能,python,pandas,numpy,series,categorical-data,Python,Pandas,Numpy,Series,Categorical Data,您好,我想在['value1','value2'] 这是我的密码: random_sample['NAME_INCOME_TYPE_ind'] = np.where(random_sample['NAME_INCOME_TYPE'] in ['Maternity leave', 'Student']), 'Other') 我尝试在这行代码的不同位置添加.any(),但仍然无法解决错误。 ValueError:序列的真值不明确。使用a.empty、a.bool()、a.item()、a.any(

您好,我想在
['value1','value2']

这是我的密码:

random_sample['NAME_INCOME_TYPE_ind'] = np.where(random_sample['NAME_INCOME_TYPE'] in ['Maternity leave', 'Student']), 'Other')
我尝试在这行代码的不同位置添加
.any()
,但仍然无法解决错误。 ValueError:序列的真值不明确。使用a.empty、a.bool()、a.item()、a.any()或a.all()

l = ('|').join(['Maternity leave', 'Student'])
m = random_sample['NAME_INCOME_TYPE'].str.contains(l)
您还可以使用以下方法生成
m

然后使用。但是,请注意,根据条件,您不能只指定要从中选择的两个值之一,您必须同时指定
x
y
。对于您的案例,您可以使用
df['NAME\u INCOME\u TYPE']
other
作为
x
y

random_sample['NAME_INCOME_TYPE_ind'] = np.where(m, 
                                                'Other',
                                                random_sample['NAME_INCOME_TYPE'])
在示例数据帧上进行测试:

df = pd.DataFrame({'NAME_INCOME_TYPE':['word1','word2','Student']})

l = ('|').join(['Maternity leave', 'Student'])
m = random_sample['NAME_INCOME_TYPE'].str.contains(l)
df['NAME_INCOME_TYPE_ind'] = np.where(m, 'Other', df['NAME_INCOME_TYPE'])

       NAME_INCOME_TYPE NAME_INCOME_TYPE_ind
0            word1                word1
1            word2                word2
2          Student                Other
用于分类变量 处理类别时,您可以用另一个类别替换类别,而不是替换字符串。这对内存和性能都有好处,因为熊猫在内部对分类数据使用因子分解

df = pd.DataFrame({'NAME_INCOME_TYPE': ['Employed', 'Maternity leave',
                                        'Benefits', 'Student']})

# turn object series to categorical
label_col = 'NAME_INCOME_TYPE'
df[label_col] = df[label_col].astype('category')

# define others
others = ['Maternity leave', 'Student']
others_label = 'Other'

# add new category and replace existing categories
df[label_col] = df[label_col].cat.add_categories([others_label])
df[label_col] = df[label_col].replace(others, others_label)

print(df)

  NAME_INCOME_TYPE
0         Employed
1            Other
2         Benefits
3            Other
您还可以使用方法链接更简洁地编写:

# define others
others, others_label = ['Maternity leave', 'Student'], 'Other'

# turn to categorical, add category, then replace
df['NAME_INCOME_TYPE'] = df['NAME_INCOME_TYPE'].astype('category')\
                                               .cat.add_categories([others_label])\
                                               .replace(others, others_label)
# define others
others, others_label = ['Maternity leave', 'Student'], 'Other'

# turn to categorical, add category, then replace
df['NAME_INCOME_TYPE'] = df['NAME_INCOME_TYPE'].astype('category')\
                                               .cat.add_categories([others_label])\
                                               .replace(others, others_label)