R 如何对具有相同ID的所有观察值进行变异？_R_Dataframe

R 如何对具有相同ID的所有观察值进行变异？

r dataframe

R 如何对具有相同ID的所有观察值进行变异？,r,dataframe,R,Dataframe,我正在使用以下数据集： Country State Town Color YLocation Height 958 115 A Red 1.23 Tall 958 115 A Blue 5.97 Short 958 115 A Yellow 4.83

我正在使用以下数据集：

Country     State      Town    Color      YLocation    Height
958         115         A       Red         1.23        Tall
958         115         A       Blue        5.97        Short
958         115         A       Yellow      4.83        Short
958         116         B       Red         3.93        Tall
958         116         B       Blue        2.27        Short
958         116         B       Yellow      9.91        Short
959         180         A       Blue        6.69        Short
959         180         A       Red         5.49        Tall
959         180         A       Green       3.27        Short
959         180         A       Red         3.99        Short

我想创建一个新的列，如果颜色为红色，高度为“高”，则显示该城镇的位置。因此，我希望上表变成：

Country     State      Town    Color      YLocation  Height  RedLocation     
958         115         A       Red         1.23    Tall          1.23
958         115         A       Blue        5.97    Short         1.23
958         115         A       Yellow      4.83    Short         1.23
958         116         B       Red         3.93    Tall          3.93
958         116         B       Blue        2.27    Short         3.93
958         116         B       Yellow      9.91    Short         3.93
959         180         A       Blue        6.69    Short         5.49
959         180         A       Red         5.49    Tall          5.49
959         180         A       Green       3.27    Short         5.49
959         180         A       Red         3.99    Short         5.49

简言之，我希望新列在颜色列等于红色且“高度”等于“高度”时标识城镇的Y位置。不幸的是，“城镇/州/国家”没有唯一标识符（由于这些列中的一些数据是数字，因此无法创建唯一标识符），因此我猜测解决方案将使用

group_by

命令，以确保RedLocation变量映射到正确的观测值

数据：

我们可以使用

match

按分组后得到索引

library(dplyr)
df1 <- df1 %>%
    group_by(Country, State, Town) %>% 
    mutate(RedLocation = Ylocation[match('Red', Color)]) %>%
    ungroup

-输出

df1
# A tibble: 9 x 6    
#  Country State Town  Color    Ylocation RedLocation
#    <int> <int> <chr> <chr>        <dbl>       <dbl>
#1     958   115 A     "Red"         1.23        1.23
#2     958   115 A     "Blue "       5.97        1.23
#3     958   115 A     "Yellow"      4.83        1.23
#4     958   116 B     "Red"         3.93        3.93
#5     958   116 B     "Blue "       2.27        3.93
#6     958   116 B     "Yellow"      9.91        3.93
#7     959   180 A     "Blue "       6.69        5.49
#8     959   180 A     "Red"         5.49        5.49
#9     959   180 A     "Green"       3.27        5.49

# A tibble: 10 x 8
#   Country State Town  Color    Height Ylocation RedLocation TallLocation
#     <int> <int> <chr> <chr>    <chr>      <dbl>       <dbl>        <dbl>
# 1     958   115 A     "Red"    Tall        1.23        1.23         1.23
# 2     958   115 A     "Blue "  Short       5.97        1.23         1.23
# 3     958   115 A     "Yellow" Short       4.83        1.23         1.23
# 4     958   116 B     "Red"    Tall        3.93        3.93         3.93
# 5     958   116 B     "Blue "  Short       2.27        3.93         3.93
# 6     958   116 B     "Yellow" Short       9.91        3.93         3.93
# 7     959   180 A     "Blue "  Short       6.69        5.49         5.49
# 8     959   180 A     "Red"    Tall        5.49        5.49         5.49
# 9     959   180 A     "Green"  Short       3.27        5.49         5.49
#10     959   180 A     "Red"    Short       3.99        5.49         5.49

或使用

data.table

library(data.table)
setDT(df1)[Color == 'Red', RedLocation := Ylocation[1], .(Country, State, Town)]

或者在

base R

中，我们可以将“颜色”为“红色”的数据子集，然后进行

合并

df2 <- subset(df1, Color == 'Red')[-4]
names(df2)[4] <- "RedLocation"
merge(df1, df2, all.x = TRUE)

df2另一个使用ave的基本R选项
within(
  df,
  RedLocation <- ave(
    ifelse(Color == "Red", Ylocation, NA),
    Country,
    State,
    Town,
    FUN = na.omit
  )
)

数据
> dput(df)
structure(list(Country = c(958L, 958L, 958L, 958L, 958L, 958L, 
959L, 959L, 959L), State = c(115L, 115L, 115L, 116L, 116L, 116L,
180L, 180L, 180L), Town = c("A", "A", "A", "B", "B", "B", "A",
"A", "A"), Color = c("Red", "Blue", "Yellow", "Red", "Blue",
"Yellow", "Blue", "Red", "Green"), Ylocation = c(1.23, 5.97,
4.83, 3.93, 2.27, 9.91, 6.69, 5.49, 3.27)), class = "data.frame", row.names = c(NA,
-9L))

我喜欢match
命令。我想知道是否有可能将其与另一项要求结合起来。例如，如果有另一个名为Height的列，我想重复上面描述的相同任务，但如果Height等于thill（这只是假设），该怎么办？简言之，我希望在上述问题上增加第二个要求。它会像添加第二个match
命令一样简单吗？@rogues77从评论中看不清楚。@rogues77我不完全确定您的评论。但是，我根据描述更新了答案。不过它没有经过测试。只是更新了postAs。一直以来，你都是个天才。使用粘贴的想法很简单，但很优雅。非常感谢你
library(data.table)
setDT(df1)[Color == 'Red', RedLocation := Ylocation[1], .(Country, State, Town)]

df2 <- subset(df1, Color == 'Red')[-4]
names(df2)[4] <- "RedLocation"
merge(df1, df2, all.x = TRUE)

within(
  df,
  RedLocation <- ave(
    ifelse(Color == "Red", Ylocation, NA),
    Country,
    State,
    Town,
    FUN = na.omit
  )
)

  Country State Town  Color Ylocation RedLocation
1     958   115    A    Red      1.23        1.23
2     958   115    A   Blue      5.97        1.23
3     958   115    A Yellow      4.83        1.23
4     958   116    B    Red      3.93        3.93
5     958   116    B   Blue      2.27        3.93
6     958   116    B Yellow      9.91        3.93
7     959   180    A   Blue      6.69        5.49
8     959   180    A    Red      5.49        5.49
9     959   180    A  Green      3.27        5.49

> dput(df)
structure(list(Country = c(958L, 958L, 958L, 958L, 958L, 958L, 
959L, 959L, 959L), State = c(115L, 115L, 115L, 116L, 116L, 116L,
180L, 180L, 180L), Town = c("A", "A", "A", "B", "B", "B", "A",
"A", "A"), Color = c("Red", "Blue", "Yellow", "Red", "Blue",
"Yellow", "Blue", "Red", "Green"), Ylocation = c(1.23, 5.97,
4.83, 3.93, 2.27, 9.91, 6.69, 5.49, 3.27)), class = "data.frame", row.names = c(NA,
-9L))