如何使用R评估分类变量随时间的变化?
我有一个数据集,两支球队在一场年度比赛中对局。奥运会分为两个赛区,东部赛区和西部赛区。我想根据前一年比赛的结果来确定某一年的卫冕冠军是谁。我想为两个部门做这个 这是我的数据集:如何使用R评估分类变量随时间的变化?,r,R,我有一个数据集,两支球队在一场年度比赛中对局。奥运会分为两个赛区,东部赛区和西部赛区。我想根据前一年比赛的结果来确定某一年的卫冕冠军是谁。我想为两个部门做这个 这是我的数据集: data <- data.frame( Team = c("Hot Dogs", "Hamburgers", "Hot Dogs", "Hamburgers", "Hot Dogs", "Hamburgers", "Pho", "Ramen", "Pho", "Ramen", "Pho",
data <- data.frame(
Team = c("Hot Dogs", "Hamburgers", "Hot Dogs", "Hamburgers", "Hot Dogs",
"Hamburgers", "Pho", "Ramen", "Pho", "Ramen", "Pho", "Ramen"),
Division = c("West", "West", "West", "West", "West", "West", "East", "East",
"East", "East", "East", "East"),
Year = c("2017", "2017", "2018", "2018", "2019", "2019", "2017", "2017",
"2018", "2018", "2019", "2019"),
Score = c("37", "2", "26", "32", "37", "9", "22", "31", "25", "32", "24", "18"))
理想情况下,我会在原始数据中添加一个结果列,以指示给定的团队是否是进入该游戏的卫冕冠军。大概是这样的:
data$Result <- c("Initial Champion", "NA", "Champion", "NA", "NA", "Champion", "NA",
"Initial Champion", "NA", "Champion", "NA", "Champion")
有没有一种简单的方法可以使用R实现这一点,如果可能的话,特别是使用tidyverse库
谢谢你的建议。提前感谢。在下面的回答中,我使用dplyr来确定初始冠军和冠军,其中初始冠军意味着第一次出现数据的团队是该部门今年最好的。在后来的几年里,一支在其分区中得分最高的球队被认为是冠军
library(dplyr)
data <- data.frame(
Team = c("Hot Dogs", "Hamburgers", "Hot Dogs", "Hamburgers", "Hot Dogs",
"Hamburgers", "Pho", "Ramen", "Pho", "Ramen", "Pho", "Ramen"),
Division = c("West", "West", "West", "West", "West", "West", "East", "East",
"East", "East", "East", "East"),
Year = c("2017", "2017", "2018", "2018", "2019", "2019", "2017", "2017",
"2018", "2018", "2019", "2019"),
Score = c("37", "2", "26", "32", "37", "9", "22", "31", "25", "32", "24", "18"),
stringsAsFactors = FALSE)
result <-
data %>%
group_by(Year, Division) %>% # First we group by each year and division
# For each division/year we get highest score then for the team with this score
# we consider it champion
mutate(high_score = as.character(max(as.numeric(Score), na.rm = TRUE)),
result = ifelse(high_score == Score, "Champion", NA_character_)) %>%
# Now to determine the initial champion we compare it with the first year
# if the row contains data of the first year in data then it is initial
mutate(result =
ifelse(min(data$Year) == Year & result == "Champion", "Initial Champion", result)) %>%
# Here we drop high_score column because it is not needed in final output
select(-high_score)
在下面的回答中,我使用dplyr来确定初始冠军和冠军,其中初始冠军意味着第一次出现数据的团队是其所在部门的最佳团队。在后来的几年里,一支在其分区中得分最高的球队被认为是冠军
library(dplyr)
data <- data.frame(
Team = c("Hot Dogs", "Hamburgers", "Hot Dogs", "Hamburgers", "Hot Dogs",
"Hamburgers", "Pho", "Ramen", "Pho", "Ramen", "Pho", "Ramen"),
Division = c("West", "West", "West", "West", "West", "West", "East", "East",
"East", "East", "East", "East"),
Year = c("2017", "2017", "2018", "2018", "2019", "2019", "2017", "2017",
"2018", "2018", "2019", "2019"),
Score = c("37", "2", "26", "32", "37", "9", "22", "31", "25", "32", "24", "18"),
stringsAsFactors = FALSE)
result <-
data %>%
group_by(Year, Division) %>% # First we group by each year and division
# For each division/year we get highest score then for the team with this score
# we consider it champion
mutate(high_score = as.character(max(as.numeric(Score), na.rm = TRUE)),
result = ifelse(high_score == Score, "Champion", NA_character_)) %>%
# Now to determine the initial champion we compare it with the first year
# if the row contains data of the first year in data then it is initial
mutate(result =
ifelse(min(data$Year) == Year & result == "Champion", "Initial Champion", result)) %>%
# Here we drop high_score column because it is not needed in final output
select(-high_score)
首先,我们得到一个包含所有冠军的表格,如果是第一名,则将他们标记为初始冠军,其他人标记为冠军:
library(dplyr)
X = data %>%
arrange(Year,desc(Score)) %>%
group_by(Division) %>%
filter(!duplicated(Year))%>%
mutate(result=rep(c("Initial Champion","Champion"),times=c(1,n()-1)))
# A tibble: 6 x 5
# Groups: Division [2]
Team Division Year Score result
<fct> <fct> <fct> <fct> <chr>
1 Hot Dogs West 2017 37 Initial Champion
2 Ramen East 2017 31 Initial Champion
3 Hamburgers West 2018 32 Champion
4 Ramen East 2018 32 Champion
5 Hamburgers West 2019 9 Champion
6 Pho East 2019 24 Champion
首先,我们得到一个包含所有冠军的表格,如果是第一名,则将他们标记为初始冠军,其他人标记为冠军:
library(dplyr)
X = data %>%
arrange(Year,desc(Score)) %>%
group_by(Division) %>%
filter(!duplicated(Year))%>%
mutate(result=rep(c("Initial Champion","Champion"),times=c(1,n()-1)))
# A tibble: 6 x 5
# Groups: Division [2]
Team Division Year Score result
<fct> <fct> <fct> <fct> <chr>
1 Hot Dogs West 2017 37 Initial Champion
2 Ramen East 2017 31 Initial Champion
3 Hamburgers West 2018 32 Champion
4 Ramen East 2018 32 Champion
5 Hamburgers West 2019 9 Champion
6 Pho East 2019 24 Champion
对不起,你能解释一下想要的结果df吗?我想要原始的数据帧,但是根据我上面描述的评估,结果列被添加到了数据帧中。我在最初的帖子中提供了结果栏的外观。希望这会有帮助。检查…对不起,你能解释一下想要的结果df吗?我想要原始的数据帧,但是根据我上面描述的评估结果添加了结果列。我在最初的帖子中提供了结果栏的外观。希望这会有帮助。检查。。。