用于多类问题的ruby-libsvm_Ruby_Machine Learning_Svm_Libsvm

用于多类问题的ruby-libsvm

ruby machine-learning

用于多类问题的ruby-libsvm,ruby,machine-learning,svm,libsvm,Ruby,Machine Learning,Svm,Libsvm,对于多类预测，请遵循为返回稍微不准确的预测而给出的库示例测试集（老师对迟到但后来道歉的学生大喊大叫）应该返回EDUCATION，而不是HEALTH require 'libsvm' # Let take our documents and create word vectors out of them. # documents = [ # 0 is JOKES, 1 is EDUCATION and 2 is HEALTH [0, "Why did the chick

对于多类预测，请遵循为返回稍微不准确的预测而给出的库示例

测试集（老师对迟到但后来道歉的学生大喊大叫）应该返回

EDUCATION

，而不是

HEALTH

require 'libsvm'

# Let take our documents and create word vectors out of them.
#
documents = [ # 0 is JOKES, 1 is EDUCATION and 2 is HEALTH
            [0, "Why did the chicken cross the road? Because a car was coming"],
            [0, "You're an elevator tech? I bet that job has its ups and downs"],
            [0, "Why did the chicken cross the road? To get the worm"],

            [1, "The university admitted more students this year and dropout rate is lessening."],
            [1, "The students turned in their homework at school before summer break."], 
            [1, "The students and teachers agreed on a plan for study."], 

            [2, "The cold outbreak was bad but not an epidemic."],
            [2, "The doctor and the nurse advised be to get rest because of my cold."],
            [2, "The doctor had to go to the hospital."]
         ]

# Lets create a dictionary of unique words and then we can
# create our vectors.  This is a very simple example.  If you
# were doing this in a production system you'd do things like
# stemming and removing all punctuation (in a less casual way).
#
dictionary = documents.map(&:last).map(&:split).flatten.uniq
dictionary = dictionary.map { |x| x.gsub(/\?|,|\.|\-/,'') }

training_set = []
documents.each do |doc|
  @features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
  training_set << [doc.first, Libsvm::Node.features(@features_array)]
end

# Lets set up libsvm so that we can test our prediction
# using the test set
#
problem = Libsvm::Problem.new
parameter = Libsvm::SvmParameter.new

parameter.cache_size = 1 # in megabytes
parameter.eps = 0.001
parameter.c   = 10

# Train classifier using training set
#
problem.set_examples(training_set.map(&:first), training_set.map(&:last))
model = Libsvm::Model.train(problem, parameter)

# Now lets test our classifier using the test set
#
test_set = [1, "The teacher yelled at the student who was late to class but later apologized."]
test_document = test_set.last.split.map{ |x| x.gsub(/\?|,|\.|\-/,'') }

doc_features = dictionary.map{|x| test_document.include?(x) ? 1 : 0 }
pred = model.predict(Libsvm::Node.features(doc_features))
puts pred # returns 2.0 BUT should have been 1.0
result = case pred
    when 0.0 then "predicted #{pred} as joke"
    when 1.0 then "predicted #{pred} as education"
    when 2.0 then "predicted #{pred} as health"
end
puts result

需要“libsvm”
#让我们拿出文档，从中创建单词向量。
#
documents=[#0是笑话，1是教育，2是健康
[0，“为什么小鸡要过马路？因为有辆车来了”]，
[0，“你是电梯技术人员？我打赌这项工作有起有落”]，
[0，“鸡为什么过马路？为了得到虫子”]，
[1，“学校今年录取了更多的学生，辍学率正在下降。”，
[1，“学生在暑假前在学校交作业。”，
[1，“学生和教师就学习计划达成一致。”，
[2，“感冒爆发很严重，但不是流行病。”，
[2，“医生和护士建议我要休息，因为我感冒了。”，
[2，“医生不得不去医院。”]
]
#让我们创建一个独特单词的字典，然后我们可以
#创建我们的向量。这是一个非常简单的例子。如果你
#如果在生产系统中执行此操作，您会执行以下操作
#词干和删除所有标点符号（以不太随意的方式）。
#
dictionary=documents.map（&:last）.map（&:split）.flant.uniq
dictionary=dictionary.map{| x | x.gsub（/\？\，|\。|\-/，''）
训练集=[]
文件。每个do |文件|
@features_array=dictionary.map{| x | doc.last.include？（x）？1:0}
训练集代码本身没有具体问题。原因很简单，就是缺乏训练数据
试着用“这所大学今年录取了更多的学生，辍学率正在下降”作为一个测试实例，这与培训集中的一个例子完全相同。该计划正确地将其划分为教育
3例支持向量机训练不够。使用交叉验证来使用更多训练数据和调整参数C的最佳方法。
从代码的角度来看，我对多类实现不是特别清楚。