Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫-多个';是/否';哑变量_Python_Pandas_Dummy Variable - Fatal编程技术网

Python 熊猫-多个';是/否';哑变量

Python 熊猫-多个';是/否';哑变量,python,pandas,dummy-variable,Python,Pandas,Dummy Variable,我有一个包含多个分类变量的数据框架,我需要将其转换为虚拟变量。性别和地区(4种类型)很容易使用pd。获取虚拟对象。但是,在此之后,我有几个变量是yes/no。如何使虚拟yes和no列包含变量名?例如,“已婚”变量将变成married\u yes和married\u no 以下是我当前的代码和前五行的屏幕截图: genderdummy=pd.get_dummies(bank_df['gender']) regiondummy=pd.get_dummies(bank_df['region']) ma

我有一个包含多个分类变量的数据框架,我需要将其转换为虚拟变量。性别和地区(4种类型)很容易使用
pd。获取虚拟对象
。但是,在此之后,我有几个变量是
yes/no
。如何使虚拟
yes
no
列包含变量名?例如,“已婚”变量将变成
married\u yes
married\u no

以下是我当前的代码和前五行的屏幕截图:

genderdummy=pd.get_dummies(bank_df['gender'])
regiondummy=pd.get_dummies(bank_df['region'])
marrieddummy=pd.get_dummies(bank_df['married'])
cardummy=pd.get_dummies(bank_df['car'])
savingsdummy=pd.get_dummies(bank_df['savings_acct'])
currentdummy=pd.get_dummies(bank_df['current_acct'])
mortgagedummy=pd.get_dummies(bank_df['mortgage'])
pepdummy=pd.get_dummies(bank_df['pep'])
newdata_df=pd.concat([genderdummy,regiondummy,marrieddummy,cardummy,savingsdummy,currentdummy,mortgagedummy,pepdummy], axis=1)
newdata_df.head()

根据建议,我现在有:

## HW Part 6:  Converting Categorical Variables and Exporting Data
genderdummy=pd.get_dummies(bank_df['gender'])
regiondummy=pd.get_dummies(bank_df['region'])
dummy_vars = [bank_df('married'), bank_df('car'),bank_df('savings_acct'),bank_df('current_acct'),bank_df('mortgage'),bank_df('pep')]
pd.get_dummies(bank_df[dummy_vars])
newdata_df=pd.concat([genderdummy,regiondummy,dummy_vars], axis=1)
newdata_df.head()

在中使用
前缀
参数

输出

    text_cat    text_dog
0   1           0
1   0           1
2   1           0
3   0           1

如果您改变方法,它将自动执行此操作。您只需在数据帧而不是序列上调用
pd.get_dummies

import numpy as np
import pandas as pd

# Define sample data and columns for dummy variables
df = pd.DataFrame(np.random.choice(['yes', 'no'], size=(6, 3)), columns=['gender', 'region', 'married'])
dummy_vars = ['gender', 'married']

# Create dummy variables
pd.get_dummies(df[dummy_vars])

   gender_no  gender_yes  married_no  married_yes
0          0           1           1            0
1          1           0           0            1
2          0           1           1            0
3          1           0           1            0
4          1           0           1            0
5          0           1           1            0
或者,您可以使用
前缀
参数显式:

pd.get_dummies(df[dummy_vars], prefix=dummy_vars)

更新:

使用您的变量,它应该如下所示:

genderdummy = pd.get_dummies(bank_df['gender'])
regiondummy = pd.get_dummies(bank_df['region'])
dummy_vars = ['married', 'car', 'savings_acct', 'current_acct', 'mortgage', 'pep']
other_dummies = pd.get_dummies(bank_df[dummy_vars])
newdata_df = pd.concat([genderdummy, regiondummy, other_dummies], axis=1)
newdata_df.head()

请注意,
dummy\u vars
只是您在
bank\u df
中的列名,很抱歉,我对python非常陌生,所以我可能犯了一个简单的错误。下面是我在其中两个yes/no变量上尝试的结果,看看它是否有效:dummy_-vars=[bank_-df('marred'),bank_-df('car')]pd.get_-dummies(df[dummy_-vars]),您可能会发现,bank_-df是原始df的名称:TypeError Traceback(最近的调用last)in()--->1 dummy_-vars=[bank_df('marred')、bank_df('car')]2 pd.get_dummies(df[dummy_vars])TypeError:'DataFrame'对象不可调用
dummy_vars
应该只是我提供的示例中的列名。因此,请尝试以下操作:
dummy_vars=['marred',car]
。很抱歉,我还是遇到了一个错误。我以为我理解了你的代码,但我想我不理解。调用numpy和pandas后的第一行代码创建了一个用于示例的数据帧,对吗?如果是这样,我尝试将你的代码改编为我正在使用的数据帧:“bank_df”。这就是我这样做的原因:dummy_vars=[bank_df('marred')),bank_-df('car')]是的
df
只是一个示例数据帧。它只是代替您的数据帧
bank_-df
pd.get_-dummie(bank_-df[dummy\u-vars])
应该适用于您的情况。只要
dummy\u vars
是一个列名列表。您会遇到什么错误?您应该编辑帖子以显示此新错误。@immaprogrammingnoob请参阅如何更改代码的更新。仅供参考
bank\u df('car'))
语法不正确,因此
TypeError:“DataFrame”对象不可调用
genderdummy = pd.get_dummies(bank_df['gender'])
regiondummy = pd.get_dummies(bank_df['region'])
dummy_vars = ['married', 'car', 'savings_acct', 'current_acct', 'mortgage', 'pep']
other_dummies = pd.get_dummies(bank_df[dummy_vars])
newdata_df = pd.concat([genderdummy, regiondummy, other_dummies], axis=1)
newdata_df.head()