加权斜率1算法?(从Python移植到R)
我在读关于(和更多)的书 形式上)从不同的用户获取项目评分,并给定至少包含1个评分和1个缺失值的用户向量,预测缺失评分 我找到了一个,但我很难将它移植到(我更习惯于这样)。下面是我的尝试。有没有关于如何让它工作的建议 各位,先谢谢你们加权斜率1算法?(从Python移植到R),python,r,prediction,recommendation-engine,Python,R,Prediction,Recommendation Engine,我在读关于(和更多)的书 形式上)从不同的用户获取项目评分,并给定至少包含1个评分和1个缺失值的用户向量,预测缺失评分 我找到了一个,但我很难将它移植到(我更习惯于这样)。下面是我的尝试。有没有关于如何让它工作的建议 各位,先谢谢你们 # take a 'training' set, tr.set and a vector with some missing ratings, d pred=function(tr.set,d) { tr.set=rbind(tr.set,d) n
# take a 'training' set, tr.set and a vector with some missing ratings, d
pred=function(tr.set,d) {
tr.set=rbind(tr.set,d)
n.items=ncol(tr.set)
# tally frequencies to use as weights
freqs=sapply(1:n.items, function(i) {
unlist(lapply(1:n.items, function(j) {
sum(!(i==j)&!is.na(tr.set[,i])&!is.na(tr.set[,j])) })) })
# estimate product-by-product mean differences in ratings
diffs=array(NA, dim=c(n.items,n.items))
diffs=sapply(1:n.items, function(i) {
unlist(lapply(1:n.items, function(j) {
diffs[j,i]=mean(tr.set[,i]-tr.set[,j],na.rm=T) })) })
# create an output vector with NAs for all the items the user has already rated
pred.out=as.numeric(is.na(d))
pred.out[!is.na(d)]=NA
a=which(!is.na(pred.out))
b=which(is.na(pred.out))
# calculated the weighted slope one estimate
pred.out[a]=sapply(a, function(i) {
sum(unlist(lapply(b,function (j) {
sum((d[j]+diffs[j,i])*freqs[j,i])/rowSums(freqs)[i] }))) })
names(pred.out)=colnames(tr.set)
return(pred.out) }
# end function
# test, using example from [3]
alice=c(squid=1.0, octopus=0.2, cuttlefish=0.5, nautilus=NA)
bob=c(squid=1.0, octopus=0.5, cuttlefish=NA, nautilus=0.2)
carole=c(squid=0.2, octopus=1.0, cuttlefish=0.4, nautilus=0.4)
dave=c(squid=NA, octopus=0.4, cuttlefish=0.9, nautilus=0.5)
tr.set2=rbind(alice,bob,carole,dave)
lucy2=c(squid=0.4, octopus=NA, cuttlefish=NA, nautilus=NA)
pred(tr.set2,lucy2)
# not correct
# correct(?): {'nautilus': 0.10, 'octopus': 0.23, 'cuttlefish': 0.25}
不久前,我使用相同的参考(Bryan O'Sullivan的python代码)编写了Slope One的R版本。我正在粘贴下面的代码,以防万一
predict <- function(userprefs, data.freqs, data.diffs) {
seen <- names(userprefs)
preds <- sweep(data.diffs[ , seen, drop=FALSE], 2, userprefs, '+')
preds <- preds * data.freqs[ , seen]
preds <- apply(preds, 1, sum)
freqs <- apply(data.freqs[ , seen, drop=FALSE], 1, sum)
unseen <- setdiff(names(preds), seen)
result <- preds[unseen] / freqs[unseen]
return(result[is.finite(result)])
}
update <- function(userdata, freqs, diffs) {
for (ratings in userdata) {
items <- names(ratings)
n <- length(ratings)
ratdiff <- rep(ratings, n) - rep(ratings, rep(n, n))
diffs[items, items] <- diffs[items, items] + ratdiff
freqs[items, items] <- freqs[items, items] + 1
}
diffs <- diffs / freqs
return(list(freqs=freqs, diffs=diffs))
}
userdata <- list(alice=c(squid=1.0, cuttlefish=0.5, octopus=0.2),
bob=c(squid=1.0, octopus=0.5, nautilus=0.2),
carole=c(squid=0.2, octopus=1.0, cuttlefish=0.4, nautilus=0.4),
dave=c(cuttlefish=0.9, octopus=0.4, nautilus=0.5))
items <- c('squid', 'cuttlefish', 'nautilus', 'octopus')
n.items <- length(items)
freqs <- diffs <- matrix(0, nrow=n.items, ncol=n.items, dimnames=list(items, items))
result <- update(userdata, freqs, diffs)
print(result$freqs)
print(result$diffs)
userprefs <- c(squid=.4)
predresult <- predict(userprefs, result$freqs, result$diffs)
print(predresult)
predict我试图将代码格式化为更可读的格式,但我不熟悉R。对不起,这不是好的款式。