Python 3.x 合并';列属性';将单个列转换为单独的列,以减少该单个列的虚拟变量量
例如,如果一列有14个不同的[唯一值]value_counts(),并且它们有一些共同点, 在我们的示例中[当我们将'Loan.Purpose'与'Interest.Rate'列分组,并根据Loan.Purpose mean()值计算每个[Unique Values]值的平均值时],我们得到特定值的特定共同平均利率,例如:-('car'、'Education'、'major_purchase')属性的平均值为11.0,现在我想把上面提到的('car'、'education'、'major_purchase')[Unique Values]value_counts()合并到列名称“LP_cem”下,因为它们具有相同的平均值,同样,我想对其他value_counts()做相同的处理 这样我就可以把虚拟变量的数量从14个减少到4个 基本上,我想根据3/4列的平均值()将14个不同的值_counts()合并到3/4列下,然后用这些3/4列创建假人 如下所示Python 3.x 合并';列属性';将单个列转换为单独的列,以减少该单个列的虚拟变量量,python-3.x,pandas,data-mining,dummy-variable,data-preprocessing,Python 3.x,Pandas,Data Mining,Dummy Variable,Data Preprocessing,例如,如果一列有14个不同的[唯一值]value_counts(),并且它们有一些共同点, 在我们的示例中[当我们将'Loan.Purpose'与'Interest.Rate'列分组,并根据Loan.Purpose mean()值计算每个[Unique Values]值的平均值时],我们得到特定值的特定共同平均利率,例如:-('car'、'Education'、'major_purchase')属性的平均值为11.0,现在我想把上面提到的('car'、'education'、'major_pur
LP_cem LP_chos LP_dm LP_hmvw LP_renewable_energy
0 0 0 1 0 0
1 0 0 1 0 0
2 0 0 1 0 0
3 0 0 1 0 0
4 0 1 0 0 0
原始数据['Loan.Purpose'].价值计数()
我已经根据利率的平均值从贷款目的
中收集了数据
raw_data_8 = round(raw_data_5.groupby('Loan.Purpose')['Interest.Rate'].mean())
raw_data_8
Loan.Purpose
CHOS 15.0
DM 12.0
car 11.0
credit_card 13.0
debt_consolidation 14.0
educational 11.0
home_improvement 12.0
house 13.0
major_purchase 11.0
medical 12.0
moving 14.0
other 13.0
renewable_energy 10.0
small_business 13.0
vacation 12.0
wedding 12.0
Name: Interest.Rate, dtype: float64
现在,我想将具有相同平均值的值组合在一起,我甚至尝试了代码,但它给出了一个错误
for i in range(len(raw_data_5.index)):
if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
if raw_data_5['Loan.Purpose'][i] in ['credit_care','house','other','small_business']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'chos'
if raw_data_5['Loan.Purpose'][i] in ['debt_consolidation','moving']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'dcm'
error = TypeError Traceback (most recent
call last)
<ipython-input-51-cf7ef2ae1efd> in <module>
----> 1 for i in range(raw_data_5.index):
2 if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
3 raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
4 if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
5 raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
TypeError: 'Int64Index' object cannot be interpreted as an integer
Interest.Rate Loan.Length Loan.Purpose
0 8.90 36.0 debt_consolidation
1 12.12 36.0 debt_consolidation
2 21.98 60.0 debt_consolidation
3 9.99 36.0 debt_consolidation
4 11.71 36.0 credit_card
5 15.31 36.0 other
6 7.90 36.0 debt_consolidation
7 17.14 60.0 credit_card
8 14.33 36.0 credit_card
10 19.72 36.0 moving
11 14.27 36.0 debt_consolidation
12 21.67 60.0 debt_consolidation
13 8.90 36.0 debt_consolidation
14 7.62 36.0 debt_consolidation
15 15.65 60.0 debt_consolidation
16 12.12 36.0 debt_consolidation
17 10.37 60.0 debt_consolidation
18 9.76 36.0 credit_card
19 9.99 60.0 debt_consolidation
20 21.98 36.0 debt_consolidation
21 19.05 60.0 credit_card
22 17.99 60.0 car
23 11.99 36.0 credit_card
24 16.82 60.0 vacation
25 7.90 36.0 debt_consolidation
26 14.42 36.0 debt_consolidation
27 15.31 36.0 debt_consolidation
28 8.59 36.0 other
29 7.90 36.0 debt_consolidation
30 21.00 60.0 debt_consolidation
范围内i的(len(原始数据索引)):
如果原始数据[5][Loan.Purpose][i]位于['car'、'education'、'major\u purchase']:
原始数据5.iloc[i,‘贷款目的’]=“cem”
如果原始数据[5][Loan.Purpose][i]出现在['home\u profication'、'medical'、'vacation'、'widding']中:
原始数据5.iloc[i,‘贷款目的’]=“hmvw”
如果原始数据[5][Loan.Purpose][i]位于[“信贷关怀”、“房屋”、“其他”、“小型企业”]:
原始数据5.iloc[i,‘Loan.Purpose’]=“chos”
如果原始数据[5][Loan.Purpose][i]在['debt\u compolidation','moving']中:
原始数据iloc[i,‘贷款用途’]=“dcm”
错误=类型错误回溯(最近
最后一次呼叫)
在里面
---->1表示范围内的i(原始数据5.索引):
2如果原始数据[5][Loan.Purpose][i]在[car]、[Education]、[major\u purchase]中:
3原始数据5.iloc[i,‘贷款用途’]=“cem”
4如果原始数据[5][Loan.Purpose][i]出现在[“家庭改善”、“医疗”、“度假”、“婚礼”]:
5原始数据5.iloc[i,‘贷款目的’]=“hmvw”
TypeError:“Int64Index”对象不能解释为整数
利息。利率贷款。期限贷款。用途
0 8.90 36.0债务合并
1 12.12 36.0债务合并
2 21.98 60.0债务合并
3 9.99 36.0债务合并
4 11.71 36.0信用卡
5 15.31 36.0其他
6.7.90 36.0债务合并
7 17.14 60.0信用卡
8 14.33 36.0信用卡
10 19.72 36.0移动
11 14.27 36.0债务合并
12 21.67 60.0债务合并
13.8.90 36.0债务合并
14 7.62 36.0债务合并
15.65 60.0债务合并
16 12.12 36.0债务合并
17 10.37 60.0债务合并
18 9.76 36.0信用卡
19.9.99 60.0债务合并
20 21.98 36.0债务合并
21 19.05 60.0信用卡
22 17.99 60.0轿车
23 11.99 36.0信用卡
24 16.82 60.0假期
25 7.90 36.0债务合并
26 14.42 36.0债务合并
27 15.31 36.0债务合并
28 8.59 36.0其他
29 7.90 36.0债务合并
30 21.00 60.0债务合并
您能提供一些样本数据供我们复制吗?我已经添加了一些数据,请告诉我您是否需要其他数据,或者我是否需要添加其他数据?
for i in range(len(raw_data_5.index)):
if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
if raw_data_5['Loan.Purpose'][i] in ['credit_care','house','other','small_business']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'chos'
if raw_data_5['Loan.Purpose'][i] in ['debt_consolidation','moving']:
raw_data_5.iloc[i, 'Loan.Purpose'] = 'dcm'
error = TypeError Traceback (most recent
call last)
<ipython-input-51-cf7ef2ae1efd> in <module>
----> 1 for i in range(raw_data_5.index):
2 if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
3 raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
4 if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
5 raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
TypeError: 'Int64Index' object cannot be interpreted as an integer
Interest.Rate Loan.Length Loan.Purpose
0 8.90 36.0 debt_consolidation
1 12.12 36.0 debt_consolidation
2 21.98 60.0 debt_consolidation
3 9.99 36.0 debt_consolidation
4 11.71 36.0 credit_card
5 15.31 36.0 other
6 7.90 36.0 debt_consolidation
7 17.14 60.0 credit_card
8 14.33 36.0 credit_card
10 19.72 36.0 moving
11 14.27 36.0 debt_consolidation
12 21.67 60.0 debt_consolidation
13 8.90 36.0 debt_consolidation
14 7.62 36.0 debt_consolidation
15 15.65 60.0 debt_consolidation
16 12.12 36.0 debt_consolidation
17 10.37 60.0 debt_consolidation
18 9.76 36.0 credit_card
19 9.99 60.0 debt_consolidation
20 21.98 36.0 debt_consolidation
21 19.05 60.0 credit_card
22 17.99 60.0 car
23 11.99 36.0 credit_card
24 16.82 60.0 vacation
25 7.90 36.0 debt_consolidation
26 14.42 36.0 debt_consolidation
27 15.31 36.0 debt_consolidation
28 8.59 36.0 other
29 7.90 36.0 debt_consolidation
30 21.00 60.0 debt_consolidation