Apache spark 如何确定logistic回归中的标签和特征？_Apache Spark_Machine Learning

Apache spark 如何确定logistic回归中的标签和特征？

apache-spark machine-learning

Apache spark 如何确定logistic回归中的标签和特征？,apache-spark,machine-learning,Apache Spark,Machine Learning,我正在使用spark mlib，并使用逻辑回归模型进行分类。我点击了这个链接：如果我将.csv作为输入，我不确定此模型如何识别标签和功能？有人能解释一下吗？因为从AT数据加载libsvm，它由标签index1:value1 index2:value2。。。。。。如果您使用.csv，您必须明确指定参数。最后，我能够修复它，我需要使用VectorAssembler或StringIndexer transformer，这里我有setInputCol，setOutputCol方法，它提供了设置标签

我正在使用spark mlib，并使用逻辑回归模型进行分类。我点击了这个链接：

如果我将.csv作为输入，我不确定此模型如何识别标签和功能？有人能解释一下吗？

因为从AT数据加载libsvm，它由标签index1:value1 index2:value2。。。。。。

如果您使用.csv，您必须明确指定参数。

最后，我能够修复它，我需要使用VectorAssembler或StringIndexer transformer，这里我有setInputCol，setOutputCol方法，它提供了设置标签和功能的方法

VectorAssembler assembler = new VectorAssembler()
                          .setInputCols(new String[]{"Lead ID"})
                          .setOutputCol("features");

sparkSession.read().option("header", true).option("inferSchema","true").csv("Book.csv");    
        dataset = new StringIndexer().setInputCol("Status").setOutputCol("label").fit(dataset).transform(dataset);

谢谢你的回复。所以，若输入是libsvm，那个么它将第一列作为标签右侧，其余列作为特征？如果输入文件为.csvtraining=spark.read.formatcsv.loaddatapath，我们如何设置标签和特性；或训练=spark.read.csvdatapath；谢谢，但我在给定行中设置标签和功能的位置？setLabelCollabel SetPredictionColPrediction当我将字符串类型作为setLabelCol的输入时，它会产生IllegalArgumentException，所以我们是否总是需要使用Stringindexer对其进行转换，然后将其设置为标签？

VectorAssembler assembler = new VectorAssembler()
                          .setInputCols(new String[]{"Lead ID"})
                          .setOutputCol("features");

sparkSession.read().option("header", true).option("inferSchema","true").csv("Book.csv");    
        dataset = new StringIndexer().setInputCol("Status").setOutputCol("label").fit(dataset).transform(dataset);