Machine learning 在Rapidminer中减少支持向量机的计算时间_Machine Learning_Classification_Svm_Rapidminer

Machine learning 在Rapidminer中减少支持向量机的计算时间

machine-learning

Machine learning 在Rapidminer中减少支持向量机的计算时间,machine-learning,classification,svm,rapidminer,Machine Learning,Classification,Svm,Rapidminer,我不熟悉数据挖掘和学习rapidminer。我需要为我正在做的项目实现一个SVM。然而，我被卡住了，因为无论支持向量机是什么，它只运行了几个小时和几天，而不知道它是否即将结束我已经使用Relieff过滤器和正向选择包装器删除了尽可能多的特征，我使用的线性Karnel应该是最快的，SVM的C值为0。数据集本身是3950个14维的对象，我认为这并不多我能想到花费这么多时间的唯一原因是我使用了10次交叉验证，但即使如此，也不应该花费几天时间。因此，我的问题是： 1-看看我是如何在下面的示例中实现

我不熟悉数据挖掘和学习rapidminer。我需要为我正在做的项目实现一个SVM。然而，我被卡住了，因为无论支持向量机是什么，它只运行了几个小时和几天，而不知道它是否即将结束

我已经使用Relieff过滤器和正向选择包装器删除了尽可能多的特征，我使用的线性Karnel应该是最快的，SVM的C值为0。数据集本身是3950个14维的对象，我认为这并不多

我能想到花费这么多时间的唯一原因是我使用了10次交叉验证，但即使如此，也不应该花费几天时间。因此，我的问题是：

1-看看我是如何在下面的示例中实现我的svm的，我可以做些什么来减少运行时间吗

2-In rapidminer是否有任何方法可以查看SVM中发生了什么，以了解为什么需要这么长时间？或者至少检查交叉验证的迭代是什么

进程本身已经使用了预处理后的文件，我无法共享数据集，如下所示：

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <parameter key="logverbosity" value="all"/>
    <parameter key="logfile" value="D:\testexrff.xrff"/>
    <process expanded="true">
      <operator activated="true" class="read_xrff" compatibility="5.3.008" expanded="true" height="60" name="Read XRFF (4)" width="90" x="45" y="165">
        <parameter key="data_file" value="C:\Users\glintthssig\Desktop\wrapper"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.3.008" expanded="true" height="112" name="Validation (11)" width="90" x="246" y="120">
        <parameter key="use_local_random_seed" value="true"/>
        <process expanded="true">
          <operator activated="true" class="remap_binominals" compatibility="5.3.008" expanded="true" height="76" name="Remap Binominals (5)" width="90" x="45" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="REINTERNAMENTO"/>
            <parameter key="negative_value" value="N"/>
            <parameter key="positive_value" value="S"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical" width="90" x="45" y="165">
            <list key="comparison_groups"/>
          </operator>
          <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.008" expanded="true" height="76" name="SVM (2)" width="90" x="179" y="165">
            <parameter key="kernel_type" value="linear"/>
            <list key="class_weights"/>
          </operator>
          <connect from_port="training" to_op="Remap Binominals (5)" to_port="example set input"/>
          <connect from_op="Remap Binominals (5)" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="SVM (2)" to_port="training set"/>
          <connect from_op="SVM (2)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="remap_binominals" compatibility="5.3.008" expanded="true" height="76" name="Remap Binominals (8)" width="90" x="45" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="REINTERNAMENTO"/>
            <parameter key="negative_value" value="N"/>
            <parameter key="positive_value" value="S"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.008" expanded="true" height="94" name="Nominal to Numerical (4)" width="90" x="179" y="165">
            <list key="comparison_groups"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model (11)" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance (11)" width="90" x="212" y="30"/>
          <connect from_port="model" to_op="Apply Model (11)" to_port="model"/>
          <connect from_port="test set" to_op="Remap Binominals (8)" to_port="example set input"/>
          <connect from_op="Remap Binominals (8)" from_port="example set output" to_op="Nominal to Numerical (4)" to_port="example set input"/>
          <connect from_op="Nominal to Numerical (4)" from_port="example set output" to_op="Apply Model (11)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (11)" from_port="labelled data" to_op="Performance (11)" to_port="labelled data"/>
          <connect from_op="Performance (11)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read XRFF (4)" from_port="output" to_op="Validation (11)" to_port="training"/>
      <connect from_op="Validation (11)" from_port="averagable 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

这个过程看起来不错

目前有一个奇怪的问题，有时可以通过使用Materialize Data操作符来解决。我建议将其放在交叉验证的某个地方，就在SVM操作符之前

它会神奇地工作，如果不工作，我们将不得不求助于其他方法。

考虑使用探查器，获取rapidminer源代码。您可以尝试其他svm操作符，看看它是否工作得更快。要查看交叉验证的位置，可以查看底部的状态栏。作为替代方法，您可以使用日志操作符。你也可以使用样本操作符来减少数据大小，这样你就可以计算出1%，5%，10%等等所需的时间来估计整个数据集。奇怪的是，我尝试了内核，理论上应该是RBF和Sigmoid性能最差的，我可以在一个小时内得到结果。所以我不知道为什么对于线性和多项式它会无限期地运行，除了假设那些内核不能为这个数据集绘制超平面。我使用的支持向量机操作符是LibSVM，它是最常用的操作符之一。如果你将示例的数量减少到一个非常小的数量，比如说1%，它会在一段有意义的时间内完成吗？