R 根据条件更改分组中的值
我从以下数据开始:R 根据条件更改分组中的值,r,dataframe,dplyr,R,Dataframe,Dplyr,我从以下数据开始: df <- data.frame(Person=c("Ada","Ada","Bob","Bob","Carl","Carl"), Day=c(1,2,2,1,1,2), Fruit=c("Apple","X","Apple","X","X","Orange")) Person Day Fruit 1 Ada 1 Apple 2 Ada 2 X 3 Bob 2 Apple 4 Bob 1 X 5
df <- data.frame(Person=c("Ada","Ada","Bob","Bob","Carl","Carl"), Day=c(1,2,2,1,1,2), Fruit=c("Apple","X","Apple","X","X","Orange"))
Person Day Fruit
1 Ada 1 Apple
2 Ada 2 X
3 Bob 2 Apple
4 Bob 1 X
5 Carl 1 X
6 Carl 2 Orange
- 有什么方向的建议吗
dplyr
使用case\u的另一种解决方案:
library(dplyr)
# Changing datatypes to character instead of factor
df[] <- lapply(df, as.character)
# Optional, but this line will convert all columns to appropriate datatype, eg. Day will be integer
df <- readr::type_convert(df)
df %>%
group_by(Person) %>%
mutate(
Contains_Apple = any(Fruit == "Apple"),
Contains_Orange = any(Fruit == "Orange"),
Fruit = case_when(
Fruit == "X" & Contains_Apple == F ~ "Apple",
Fruit == "X" & Contains_Orange == F ~ "Orange",
TRUE ~ Fruit
)
)
# A tibble: 6 x 5
# Groups: Person [3]
Person Day Fruit Contains_Apple Contains_Orange
<chr> <int> <chr> <lgl> <lgl>
1 Ada 1 Apple T F
2 Ada 2 Orange T F
3 Bob 2 Apple T F
4 Bob 1 Orange T F
5 Carl 1 Apple F T
6 Carl 2 Orange F T
这里有一个想法,当
检查每组是否已经有了“苹果”或“橙色”,然后如果水果是“X”,则分配相反的值
请注意,我在创建示例数据框时添加了stringsAsFactors=FALSE
,目的是避免创建因子列
library(dplyr)
library(tidyr)
df %>%
group_by(Person) %>%
mutate(Fruit = case_when(
Fruit %in% "X" & any(Fruit %in% "Apple") ~ "Orange",
Fruit %in% "X" & any(Fruit %in% "Orange") ~ "Apple",
TRUE ~ Fruit
)) %>%
ungroup()
# # A tibble: 6 x 3
# Person Day Fruit
# <chr> <dbl> <chr>
# 1 Ada 1.00 Apple
# 2 Ada 2.00 Orange
# 3 Bob 2.00 Apple
# 4 Bob 1.00 Orange
# 5 Carl 1.00 Apple
# 6 Carl 2.00 Orange
库(dplyr)
图书馆(tidyr)
df%>%
分组单位(人)%>%
变异(果=情况)(
水果%在%“X”和任何(水果%在%“苹果”)~“橙色”,
水果%在%“X”和任何(水果%在%“橙”)~“苹果”,
真的~水果
)) %>%
解组()
##tibble:6 x 3
#人日水果
#
#1 Ada 1.00苹果
#2 Ada 2.00橙色
#3鲍勃2.00苹果
#4鲍勃1.00橙色
#5卡尔1.00苹果
#6卡尔2.00橙色
数据
df <- data.frame(Person=c("Ada","Ada","Bob","Bob","Carl","Carl"),
Day=c(1,2,2,1,1,2),
Fruit=c("Apple","X","Apple","X","X","Orange"),
stringsAsFactors = FALSE)
df简单循环:
fruity_loop <- function(frame) {
ops <- c('Apple', 'Orange')
for(x in 1:nrow(frame)) {
if(frame[x,]['Fruit'] == 'X') {
if(frame[x-1,]['Fruit'] == ops[1]) { frame[x,]['Fruit'] <- ops[2] } else { frame[x,]['Fruit'] <- ops[1] } }
}
return(frame)
}
谢谢,我喜欢这种方法!尽管出于某种原因,我无法复制你的结果;“Carl”两天都有橙子。试着找出示例数据集和真实数据集之间的区别。非常感谢!我无法复制你的结果(我开始认为我这方面有问题,因为我在另一个回复中遇到了同样的复制问题)。前三列保持不变,两个新列“Contains_…”用真值填充。嗯,我不知道是什么导致了这个问题。如有疑问,请重新启动R会话并再次运行代码。否则,如果可以的话,可以更新dplyr
?另外,尝试用您正在使用的代码更新您的问题,以获得此结果。我会看看我是否遇到了同样的问题。一个简单的重新启动实际上做到了,谢谢!工作起来很有魅力!
df <- data.frame(Person=c("Ada","Ada","Bob","Bob","Carl","Carl"),
Day=c(1,2,2,1,1,2),
Fruit=c("Apple","X","Apple","X","X","Orange"),
stringsAsFactors = FALSE)
fruity_loop <- function(frame) {
ops <- c('Apple', 'Orange')
for(x in 1:nrow(frame)) {
if(frame[x,]['Fruit'] == 'X') {
if(frame[x-1,]['Fruit'] == ops[1]) { frame[x,]['Fruit'] <- ops[2] } else { frame[x,]['Fruit'] <- ops[1] } }
}
return(frame)
}
fruity_loop(df)