R 如何对具有相同ID的所有观察值进行变异?
我正在使用以下数据集:R 如何对具有相同ID的所有观察值进行变异?,r,dataframe,R,Dataframe,我正在使用以下数据集: Country State Town Color YLocation Height 958 115 A Red 1.23 Tall 958 115 A Blue 5.97 Short 958 115 A Yellow 4.83
Country State Town Color YLocation Height
958 115 A Red 1.23 Tall
958 115 A Blue 5.97 Short
958 115 A Yellow 4.83 Short
958 116 B Red 3.93 Tall
958 116 B Blue 2.27 Short
958 116 B Yellow 9.91 Short
959 180 A Blue 6.69 Short
959 180 A Red 5.49 Tall
959 180 A Green 3.27 Short
959 180 A Red 3.99 Short
我想创建一个新的列,如果颜色为红色,高度为“高”,则显示该城镇的位置。因此,我希望上表变成:
Country State Town Color YLocation Height RedLocation
958 115 A Red 1.23 Tall 1.23
958 115 A Blue 5.97 Short 1.23
958 115 A Yellow 4.83 Short 1.23
958 116 B Red 3.93 Tall 3.93
958 116 B Blue 2.27 Short 3.93
958 116 B Yellow 9.91 Short 3.93
959 180 A Blue 6.69 Short 5.49
959 180 A Red 5.49 Tall 5.49
959 180 A Green 3.27 Short 5.49
959 180 A Red 3.99 Short 5.49
简言之,我希望新列在颜色列等于红色且“高度”等于“高度”时标识城镇的Y位置。不幸的是,“城镇/州/国家”没有唯一标识符(由于这些列中的一些数据是数字,因此无法创建唯一标识符),因此我猜测解决方案将使用group_by
命令,以确保RedLocation变量映射到正确的观测值
数据:
我们可以使用
match
按分组后得到索引
library(dplyr)
df1 <- df1 %>%
group_by(Country, State, Town) %>%
mutate(RedLocation = Ylocation[match('Red', Color)]) %>%
ungroup
-输出
df1
# A tibble: 9 x 6
# Country State Town Color Ylocation RedLocation
# <int> <int> <chr> <chr> <dbl> <dbl>
#1 958 115 A "Red" 1.23 1.23
#2 958 115 A "Blue " 5.97 1.23
#3 958 115 A "Yellow" 4.83 1.23
#4 958 116 B "Red" 3.93 3.93
#5 958 116 B "Blue " 2.27 3.93
#6 958 116 B "Yellow" 9.91 3.93
#7 959 180 A "Blue " 6.69 5.49
#8 959 180 A "Red" 5.49 5.49
#9 959 180 A "Green" 3.27 5.49
# A tibble: 10 x 8
# Country State Town Color Height Ylocation RedLocation TallLocation
# <int> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 958 115 A "Red" Tall 1.23 1.23 1.23
# 2 958 115 A "Blue " Short 5.97 1.23 1.23
# 3 958 115 A "Yellow" Short 4.83 1.23 1.23
# 4 958 116 B "Red" Tall 3.93 3.93 3.93
# 5 958 116 B "Blue " Short 2.27 3.93 3.93
# 6 958 116 B "Yellow" Short 9.91 3.93 3.93
# 7 959 180 A "Blue " Short 6.69 5.49 5.49
# 8 959 180 A "Red" Tall 5.49 5.49 5.49
# 9 959 180 A "Green" Short 3.27 5.49 5.49
#10 959 180 A "Red" Short 3.99 5.49 5.49
或使用
data.table
library(data.table)
setDT(df1)[Color == 'Red', RedLocation := Ylocation[1], .(Country, State, Town)]
或者在
base R
中,我们可以将“颜色”为“红色”的数据子集,然后进行合并
df2 <- subset(df1, Color == 'Red')[-4]
names(df2)[4] <- "RedLocation"
merge(df1, df2, all.x = TRUE)
df2另一个使用ave的基本R选项
within(
df,
RedLocation <- ave(
ifelse(Color == "Red", Ylocation, NA),
Country,
State,
Town,
FUN = na.omit
)
)
数据
> dput(df)
structure(list(Country = c(958L, 958L, 958L, 958L, 958L, 958L,
959L, 959L, 959L), State = c(115L, 115L, 115L, 116L, 116L, 116L,
180L, 180L, 180L), Town = c("A", "A", "A", "B", "B", "B", "A",
"A", "A"), Color = c("Red", "Blue", "Yellow", "Red", "Blue",
"Yellow", "Blue", "Red", "Green"), Ylocation = c(1.23, 5.97,
4.83, 3.93, 2.27, 9.91, 6.69, 5.49, 3.27)), class = "data.frame", row.names = c(NA,
-9L))
我喜欢match
命令。我想知道是否有可能将其与另一项要求结合起来。例如,如果有另一个名为Height的列,我想重复上面描述的相同任务,但如果Height等于thill(这只是假设),该怎么办?简言之,我希望在上述问题上增加第二个要求。它会像添加第二个match
命令一样简单吗?@rogues77从评论中看不清楚。@rogues77我不完全确定您的评论。但是,我根据描述更新了答案。不过它没有经过测试。只是更新了postAs。一直以来,你都是个天才。使用粘贴的想法很简单,但很优雅。非常感谢你
library(data.table)
setDT(df1)[Color == 'Red', RedLocation := Ylocation[1], .(Country, State, Town)]
df2 <- subset(df1, Color == 'Red')[-4]
names(df2)[4] <- "RedLocation"
merge(df1, df2, all.x = TRUE)
within(
df,
RedLocation <- ave(
ifelse(Color == "Red", Ylocation, NA),
Country,
State,
Town,
FUN = na.omit
)
)
Country State Town Color Ylocation RedLocation
1 958 115 A Red 1.23 1.23
2 958 115 A Blue 5.97 1.23
3 958 115 A Yellow 4.83 1.23
4 958 116 B Red 3.93 3.93
5 958 116 B Blue 2.27 3.93
6 958 116 B Yellow 9.91 3.93
7 959 180 A Blue 6.69 5.49
8 959 180 A Red 5.49 5.49
9 959 180 A Green 3.27 5.49
> dput(df)
structure(list(Country = c(958L, 958L, 958L, 958L, 958L, 958L,
959L, 959L, 959L), State = c(115L, 115L, 115L, 116L, 116L, 116L,
180L, 180L, 180L), Town = c("A", "A", "A", "B", "B", "B", "A",
"A", "A"), Color = c("Red", "Blue", "Yellow", "Red", "Blue",
"Yellow", "Blue", "Red", "Green"), Ylocation = c(1.23, 5.97,
4.83, 3.93, 2.27, 9.91, 6.69, 5.49, 3.27)), class = "data.frame", row.names = c(NA,
-9L))