Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 合并';列属性';将单个列转换为单独的列,以减少该单个列的虚拟变量量_Python 3.x_Pandas_Data Mining_Dummy Variable_Data Preprocessing - Fatal编程技术网

Python 3.x 合并';列属性';将单个列转换为单独的列,以减少该单个列的虚拟变量量

Python 3.x 合并';列属性';将单个列转换为单独的列,以减少该单个列的虚拟变量量,python-3.x,pandas,data-mining,dummy-variable,data-preprocessing,Python 3.x,Pandas,Data Mining,Dummy Variable,Data Preprocessing,例如,如果一列有14个不同的[唯一值]value_counts(),并且它们有一些共同点, 在我们的示例中[当我们将'Loan.Purpose'与'Interest.Rate'列分组,并根据Loan.Purpose mean()值计算每个[Unique Values]值的平均值时],我们得到特定值的特定共同平均利率,例如:-('car'、'Education'、'major_purchase')属性的平均值为11.0,现在我想把上面提到的('car'、'education'、'major_pur

例如,如果一列有14个不同的[唯一值]value_counts(),并且它们有一些共同点, 在我们的示例中[当我们将'Loan.Purpose'与'Interest.Rate'列分组,并根据Loan.Purpose mean()值计算每个[Unique Values]值的平均值时],我们得到特定值的特定共同平均利率,例如:-('car'、'Education'、'major_purchase')属性的平均值为11.0,现在我想把上面提到的('car'、'education'、'major_purchase')[Unique Values]value_counts()合并到列名称“LP_cem”下,因为它们具有相同的平均值,同样,我想对其他value_counts()做相同的处理

这样我就可以把虚拟变量的数量从14个减少到4个

基本上,我想根据3/4列的平均值()将14个不同的值_counts()合并到3/4列下,然后用这些3/4列创建假人

如下所示

LP_cem  LP_chos LP_dm   LP_hmvw LP_renewable_energy
   0         0    0      1      0           0
   1         0    0      1      0           0
   2         0    0      1      0           0
   3         0    0      1      0           0
   4         0    1      0      0           0
原始数据['Loan.Purpose'].价值计数()

我已经根据
利率的平均值从
贷款目的
中收集了数据

raw_data_8 = round(raw_data_5.groupby('Loan.Purpose')['Interest.Rate'].mean())
raw_data_8

Loan.Purpose
CHOS                  15.0
DM                    12.0
car                   11.0
credit_card           13.0
debt_consolidation    14.0
educational           11.0
home_improvement      12.0
house                 13.0
major_purchase        11.0
medical               12.0
moving                14.0
other                 13.0
renewable_energy      10.0
small_business        13.0
vacation              12.0
wedding               12.0
Name: Interest.Rate, dtype: float64
现在,我想将具有相同平均值的值组合在一起,我甚至尝试了代码,但它给出了一个错误

for i in range(len(raw_data_5.index)):
if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
if raw_data_5['Loan.Purpose'][i] in ['credit_care','house','other','small_business']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'chos'
if raw_data_5['Loan.Purpose'][i] in ['debt_consolidation','moving']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'dcm'

  error = TypeError                                 Traceback (most recent 

call last)
<ipython-input-51-cf7ef2ae1efd> in <module>
----> 1 for i in range(raw_data_5.index):
      2     if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
      3         raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
      4     if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
      5         raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'

TypeError: 'Int64Index' object cannot be interpreted as an integer


    Interest.Rate   Loan.Length Loan.Purpose
0   8.90                36.0    debt_consolidation
1   12.12               36.0    debt_consolidation
2   21.98               60.0    debt_consolidation
3   9.99                36.0    debt_consolidation
4   11.71               36.0    credit_card
5   15.31               36.0    other
6   7.90                36.0    debt_consolidation
7   17.14               60.0    credit_card
8   14.33               36.0    credit_card
10  19.72               36.0    moving
11  14.27               36.0    debt_consolidation
12  21.67               60.0    debt_consolidation
13  8.90                36.0    debt_consolidation
14  7.62                36.0    debt_consolidation
15  15.65               60.0    debt_consolidation
16  12.12               36.0    debt_consolidation
17  10.37               60.0    debt_consolidation
18  9.76                36.0    credit_card
19  9.99                60.0    debt_consolidation
20  21.98               36.0    debt_consolidation
21  19.05               60.0    credit_card
22  17.99               60.0    car
23  11.99               36.0    credit_card
24  16.82               60.0    vacation
25  7.90                36.0    debt_consolidation
26  14.42               36.0    debt_consolidation
27  15.31               36.0    debt_consolidation
28  8.59                36.0    other
29  7.90                36.0    debt_consolidation
30  21.00               60.0    debt_consolidation
范围内i的
(len(原始数据索引)):
如果原始数据[5][Loan.Purpose][i]位于['car'、'education'、'major\u purchase']:
原始数据5.iloc[i,‘贷款目的’]=“cem”
如果原始数据[5][Loan.Purpose][i]出现在['home\u profication'、'medical'、'vacation'、'widding']中:
原始数据5.iloc[i,‘贷款目的’]=“hmvw”
如果原始数据[5][Loan.Purpose][i]位于[“信贷关怀”、“房屋”、“其他”、“小型企业”]:
原始数据5.iloc[i,‘Loan.Purpose’]=“chos”
如果原始数据[5][Loan.Purpose][i]在['debt\u compolidation','moving']中:
原始数据iloc[i,‘贷款用途’]=“dcm”
错误=类型错误回溯(最近
最后一次呼叫)
在里面
---->1表示范围内的i(原始数据5.索引):
2如果原始数据[5][Loan.Purpose][i]在[car]、[Education]、[major\u purchase]中:
3原始数据5.iloc[i,‘贷款用途’]=“cem”
4如果原始数据[5][Loan.Purpose][i]出现在[“家庭改善”、“医疗”、“度假”、“婚礼”]:
5原始数据5.iloc[i,‘贷款目的’]=“hmvw”
TypeError:“Int64Index”对象不能解释为整数
利息。利率贷款。期限贷款。用途
0 8.90 36.0债务合并
1 12.12 36.0债务合并
2 21.98 60.0债务合并
3 9.99 36.0债务合并
4 11.71 36.0信用卡
5 15.31 36.0其他
6.7.90 36.0债务合并
7 17.14 60.0信用卡
8 14.33 36.0信用卡
10 19.72 36.0移动
11 14.27 36.0债务合并
12 21.67 60.0债务合并
13.8.90 36.0债务合并
14 7.62 36.0债务合并
15.65 60.0债务合并
16 12.12 36.0债务合并
17 10.37 60.0债务合并
18 9.76 36.0信用卡
19.9.99 60.0债务合并
20 21.98 36.0债务合并
21 19.05 60.0信用卡
22 17.99 60.0轿车
23 11.99 36.0信用卡
24 16.82 60.0假期
25 7.90 36.0债务合并
26 14.42 36.0债务合并
27 15.31 36.0债务合并
28 8.59 36.0其他
29 7.90 36.0债务合并
30 21.00 60.0债务合并

您能提供一些样本数据供我们复制吗?我已经添加了一些数据,请告诉我您是否需要其他数据,或者我是否需要添加其他数据?
for i in range(len(raw_data_5.index)):
if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'
if raw_data_5['Loan.Purpose'][i] in ['credit_care','house','other','small_business']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'chos'
if raw_data_5['Loan.Purpose'][i] in ['debt_consolidation','moving']:
    raw_data_5.iloc[i, 'Loan.Purpose'] = 'dcm'

  error = TypeError                                 Traceback (most recent 

call last)
<ipython-input-51-cf7ef2ae1efd> in <module>
----> 1 for i in range(raw_data_5.index):
      2     if raw_data_5['Loan.Purpose'][i] in ['car','educational','major_purchase']:
      3         raw_data_5.iloc[i, 'Loan.Purpose'] = 'cem'
      4     if raw_data_5['Loan.Purpose'][i] in ['home_improvement','medical','vacation','wedding']:
      5         raw_data_5.iloc[i, 'Loan.Purpose'] = 'hmvw'

TypeError: 'Int64Index' object cannot be interpreted as an integer


    Interest.Rate   Loan.Length Loan.Purpose
0   8.90                36.0    debt_consolidation
1   12.12               36.0    debt_consolidation
2   21.98               60.0    debt_consolidation
3   9.99                36.0    debt_consolidation
4   11.71               36.0    credit_card
5   15.31               36.0    other
6   7.90                36.0    debt_consolidation
7   17.14               60.0    credit_card
8   14.33               36.0    credit_card
10  19.72               36.0    moving
11  14.27               36.0    debt_consolidation
12  21.67               60.0    debt_consolidation
13  8.90                36.0    debt_consolidation
14  7.62                36.0    debt_consolidation
15  15.65               60.0    debt_consolidation
16  12.12               36.0    debt_consolidation
17  10.37               60.0    debt_consolidation
18  9.76                36.0    credit_card
19  9.99                60.0    debt_consolidation
20  21.98               36.0    debt_consolidation
21  19.05               60.0    credit_card
22  17.99               60.0    car
23  11.99               36.0    credit_card
24  16.82               60.0    vacation
25  7.90                36.0    debt_consolidation
26  14.42               36.0    debt_consolidation
27  15.31               36.0    debt_consolidation
28  8.59                36.0    other
29  7.90                36.0    debt_consolidation
30  21.00               60.0    debt_consolidation