Deep learning 如何使用OneHotencoding删除虚拟变量陷阱
以下是我的CSV数据提取和转换代码:Deep learning 如何使用OneHotencoding删除虚拟变量陷阱,deep-learning,deeplearning4j,dl4j,Deep Learning,Deeplearning4j,Dl4j,以下是我的CSV数据提取和转换代码: Schema schema = new Schema.Builder() .addColumnsString("RowNumber") .addColumnInteger("CustomerId") .addColumnString("Surname") .addColumnInteger("CreditScore") .addColumn
Schema schema = new Schema.Builder()
.addColumnsString("RowNumber")
.addColumnInteger("CustomerId")
.addColumnString("Surname")
.addColumnInteger("CreditScore")
.addColumnCategorical("Geography",Arrays.asList("France","Spain","Germany"))
.addColumnCategorical("Gender",Arrays.asList("Male","Female"))
.addColumnsInteger("Age","Tenure","Balance","NumOfProducts","HasCrCard","IsActiveMember","EstimatedSalary","Exited").build();
TransformProcess transformProcess = new TransformProcess.Builder(schema)
.removeColumns("RowNumber","Surname","CustomerId")
.categoricalToInteger("Gender")
.categoricalToOneHot("Geography").build();
RecordReader reader = new CSVRecordReader(1,',');
reader.initialize(new FileSplit(new ClassPathResource("Churn_Modelling.csv").getFile()));
TransformProcessRecordReader transformProcessRecordReader = new TransformProcessRecordReader(reader,transformProcess);
System.out.println("args = " + transformProcessRecordReader.next() + "");
我刚试着打印第一张唱片:
args=[619,1,0,0,1,42,2,0,1,1,1,101348.88,1]
例如,三个值后跟619->1、0、0
我想保持619后面跟着0,0
基本上,我希望将第一个类别保留为基本类别,其他类别则从基本类别中预测,以避免任何多重共线关系(虚拟变量陷阱)
我该怎么做?有人能给我一些建议吗 您可以使用
transformProcess.finalSchema
检查最终的转换模式,并使用
TransformProcess transformProcess = ... same as before...
.categoricalToOneHot("Geography")
.removeColumns("Geography[France]")
.build()
谢谢你,霍尔格。正是我要找的!