rdata:在数据帧上应用dist();s行
我可以这样做:rdata:在数据帧上应用dist();s行,r,rdata,R,Rdata,我可以这样做: data <- read.csv("data.csv") p1 <- subset(data, player_name == 'Player1') p2 <- subset(data, player_name == 'Player2') dist(rbind(p1[,c("gp","points")], p2[,c("gp","chances_for","chances_for_help")])) 但很明显,这是行不通的。这里有快速解决办法吗 示例数据: p
data <- read.csv("data.csv")
p1 <- subset(data, player_name == 'Player1')
p2 <- subset(data, player_name == 'Player2')
dist(rbind(p1[,c("gp","points")], p2[,c("gp","chances_for","chances_for_help")]))
但很明显,这是行不通的。这里有快速解决办法吗
示例数据:
player_name,gp,points
Player 1,82,95
Player 2,80,88
Player 3,81,84
Player 4,82,90
Player 5,82,77
@最近的邮件基本上已经给了你完整的答案。因此,进一步研究他的方法,您可以通过以下方式对其进行扩展(我使用的是
dplyr
library)
首先创建一个行id:
library(dplyr)
data <- data %>% mutate(rowid = row_number())
要添加玩家名称,您只需创建某种玩家索引数据帧,并使用相同的想法进行更多连接:
data$V1 <- as.numeric(data$V1)
data$V2 <- as.numeric(data$V2)
data$V3 <- as.numeric(data$V3)
# now we have to remap the V1, V2, V3 to the player_name and id's..
# we can do this by create a name dataset with the indexes...
name_index <- dplyr::select(data, player_name, rowid)
data %>%
left_join(rename(name_index, closest_name1=player_name, V1=rowid)) %>%
left_join(rename(name_index, closest_name2=player_name, V2=rowid)) %>%
left_join(rename(name_index, closest_name3=player_name, V3=rowid)) %>%
dplyr::select(-V1, -V2, -V3)
我认为对整个集合执行一次
dist
,然后从每一行中选择最低的案例比运行dist
nrow
次更合适。有什么简单的例子吗?我知道这给了我什么,但它给了我矩阵中的值。我如何将其应用回原始数据集
以使其与玩家名称绑定?apply(out,1,函数(x)顺序(x)[2:4])
返回原始dat
数据集中最接近匹配的行号。
dist_data <- as.data.frame(t(apply(out, 1, function(x) colnames(out)[order(x)[2:4]])))
dist_data <- dist_data %>% mutate(rowid = row_number())
data <- data %>% left_join(dist_data, by="rowid")
data$V1 <- as.numeric(data$V1)
data$V2 <- as.numeric(data$V2)
data$V3 <- as.numeric(data$V3)
# now we have to remap the V1, V2, V3 to the player_name and id's..
# we can do this by create a name dataset with the indexes...
name_index <- dplyr::select(data, player_name, rowid)
data %>%
left_join(rename(name_index, closest_name1=player_name, V1=rowid)) %>%
left_join(rename(name_index, closest_name2=player_name, V2=rowid)) %>%
left_join(rename(name_index, closest_name3=player_name, V3=rowid)) %>%
dplyr::select(-V1, -V2, -V3)
player_name gp points rowid closest_name1 closest_name2 closest_name3
1 Player 1 82 95 1 Player 3 Player 2 Player 2
2 Player 2 80 88 2 Player 3 Player 3 Player 1
3 Player 3 81 84 3 Player 1 Player 4 Player 4
4 Player 4 82 90 4 Player 1 Player 1 Player 2
5 Player 5 82 77 5 Player 2 Player 2 Player 3