用于多类问题的ruby-libsvm

用于多类问题的ruby-libsvm,ruby,machine-learning,svm,libsvm,Ruby,Machine Learning,Svm,Libsvm,对于多类预测,请遵循为返回稍微不准确的预测而给出的库示例 测试集(老师对迟到但后来道歉的学生大喊大叫)应该返回EDUCATION,而不是HEALTH require 'libsvm' # Let take our documents and create word vectors out of them. # documents = [ # 0 is JOKES, 1 is EDUCATION and 2 is HEALTH [0, "Why did the chick

对于多类预测,请遵循为返回稍微不准确的预测而给出的库示例

测试集(老师对迟到但后来道歉的学生大喊大叫)应该返回
EDUCATION
,而不是
HEALTH

require 'libsvm'

# Let take our documents and create word vectors out of them.
#
documents = [ # 0 is JOKES, 1 is EDUCATION and 2 is HEALTH
            [0, "Why did the chicken cross the road? Because a car was coming"],
            [0, "You're an elevator tech? I bet that job has its ups and downs"],
            [0, "Why did the chicken cross the road? To get the worm"],

            [1, "The university admitted more students this year and dropout rate is lessening."],
            [1, "The students turned in their homework at school before summer break."], 
            [1, "The students and teachers agreed on a plan for study."], 

            [2, "The cold outbreak was bad but not an epidemic."],
            [2, "The doctor and the nurse advised be to get rest because of my cold."],
            [2, "The doctor had to go to the hospital."]
         ]

# Lets create a dictionary of unique words and then we can
# create our vectors.  This is a very simple example.  If you
# were doing this in a production system you'd do things like
# stemming and removing all punctuation (in a less casual way).
#
dictionary = documents.map(&:last).map(&:split).flatten.uniq
dictionary = dictionary.map { |x| x.gsub(/\?|,|\.|\-/,'') }

training_set = []
documents.each do |doc|
  @features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
  training_set << [doc.first, Libsvm::Node.features(@features_array)]
end

# Lets set up libsvm so that we can test our prediction
# using the test set
#
problem = Libsvm::Problem.new
parameter = Libsvm::SvmParameter.new

parameter.cache_size = 1 # in megabytes
parameter.eps = 0.001
parameter.c   = 10

# Train classifier using training set
#
problem.set_examples(training_set.map(&:first), training_set.map(&:last))
model = Libsvm::Model.train(problem, parameter)

# Now lets test our classifier using the test set
#
test_set = [1, "The teacher yelled at the student who was late to class but later apologized."]
test_document = test_set.last.split.map{ |x| x.gsub(/\?|,|\.|\-/,'') }

doc_features = dictionary.map{|x| test_document.include?(x) ? 1 : 0 }
pred = model.predict(Libsvm::Node.features(doc_features))
puts pred # returns 2.0 BUT should have been 1.0
result = case pred
    when 0.0 then "predicted #{pred} as joke"
    when 1.0 then "predicted #{pred} as education"
    when 2.0 then "predicted #{pred} as health"
end
puts result
需要“libsvm”
#让我们拿出文档,从中创建单词向量。
#
documents=[#0是笑话,1是教育,2是健康
[0,“为什么小鸡要过马路?因为有辆车来了”],
[0,“你是电梯技术人员?我打赌这项工作有起有落”],
[0,“鸡为什么过马路?为了得到虫子”],
[1,“学校今年录取了更多的学生,辍学率正在下降。”,
[1,“学生在暑假前在学校交作业。”,
[1,“学生和教师就学习计划达成一致。”,
[2,“感冒爆发很严重,但不是流行病。”,
[2,“医生和护士建议我要休息,因为我感冒了。”,
[2,“医生不得不去医院。”]
]
#让我们创建一个独特单词的字典,然后我们可以
#创建我们的向量。这是一个非常简单的例子。如果你
#如果在生产系统中执行此操作,您会执行以下操作
#词干和删除所有标点符号(以不太随意的方式)。
#
dictionary=documents.map(&:last).map(&:split).flant.uniq
dictionary=dictionary.map{| x | x.gsub(/\?\,|\。|\-/,'')
训练集=[]
文件。每个do |文件|
@features_array=dictionary.map{| x | doc.last.include?(x)?1:0}

训练集代码本身没有具体问题。原因很简单,就是缺乏训练数据

试着用“这所大学今年录取了更多的学生,辍学率正在下降”作为一个测试实例,这与培训集中的一个例子完全相同。该计划正确地将其划分为教育


3例支持向量机训练不够。使用交叉验证来使用更多训练数据和调整参数C的最佳方法。

从代码的角度来看,我对多类实现不是特别清楚。