使用dplyr更新其他因子级别中给定因子匹配的空白级别_R

使用dplyr更新其他因子级别中给定因子匹配的空白级别

使用dplyr更新其他因子级别中给定因子匹配的空白级别,r,R,我有这样一个数据帧： df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, text = " plantfam,lepfam,lepsp\n Asteraceae,Geometridae,Eois sp\n Asteraceae,Erebidae,\n Poaceae,Erebidae,\n Poace

我有这样一个数据帧：

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
text = "
plantfam,lepfam,lepsp\n
             Asteraceae,Geometridae,Eois sp\n
             Asteraceae,Erebidae,\n
             Poaceae,Erebidae,\n
             Poaceae,Noctuidae,\n
             Asteraceae,Saturnidae,Polyphemous sp\n
             Melastomaceae,Noctuidae,\n
             Asteraceae,,\n
             Melastomaceae,,\n
             ,Noctuidae,\n
             ,Erebidae,\n
             Poaceae, Erebidae,\n")

我试过：

condition <- quote(lepsp == "" & plantfam != "" & lepfam != "")
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>% 
mutate(lepsp= 
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam))))
subset2 <- df %>% filter(condition) %>% setdiff(df, .)
union(subset1, subset2) %>% arrange(lepsp)

我认为问题可能只是在你的

df

中，最后一行在Erebidae之前有一个空格，这导致R认为它与另一行不同

当我完成我的答案时，我发现了这一点。这是我如何做你想做的事。我在

mutate

粘贴之前引入了一个组号

lepfam\u number

library(dplyr)
df %>%
  group_by(lepfam) %>%
  mutate(lepfam_number= match(plantfam, unique(plantfam)),
         lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="",
               paste0(lepfam,"_morphosp",lepfam_number),
               lepsp)
  )

                     plantfam      lepfam               lepsp lepfam_number
                        <chr>       <chr>               <chr>         <int>
1                  Asteraceae Geometridae             Eois sp             1
2                  Asteraceae    Erebidae  Erebidae_morphosp1             1
3                     Poaceae    Erebidae  Erebidae_morphosp2             2
4                     Poaceae   Noctuidae Noctuidae_morphosp1             1
5                  Asteraceae  Saturnidae      Polyphemous sp             1
6               Melastomaceae   Noctuidae Noctuidae_morphosp2             2
7                  Asteraceae                                             1
8               Melastomaceae                                             2
9                               Noctuidae                                 3
10                               Erebidae                                 3
11                    Poaceae    Erebidae  Erebidae_morphosp2             2

库（dplyr）
df%>%
组别(lepfam)%>%
突变（lepfam_编号=匹配（plantfam，unique（plantfam）），
lepsp=ifelse（lepsp=“”&lepfam！“”&trimws（plantfam）！“”，
粘贴0（lepfam，“_morphosp”，lepfam_编号），
lepsp）
)
plantfam lepfam lepsp lepfam_编号
1菊科几何科Eois sp 1
2菊科菊科菊科菊科菊科菊科菊科菊科菊科菊科1
3禾本科菊科菊科菊科菊科2
4禾本科夜蛾科夜蛾科1
5菊科土星科水龙鱼属1
6野牡丹科夜蛾科夜蛾科2
7菊科1
8野牡丹科2
9夜蛾科3
10艾瑞毕科3
11禾本科菊科菊科菊科菊科2

数据

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
                 text = "
plantfam,lepfam,lepsp\n
             Asteraceae,Geometridae,Eois sp\n
             Asteraceae,Erebidae,\n
             Poaceae,Erebidae,\n
             Poaceae,Noctuidae,\n
             Asteraceae,Saturnidae,Polyphemous sp\n
             Melastomaceae,Noctuidae,\n
             Asteraceae,,\n
             Melastomaceae,,\n
             ,Noctuidae,\n
             ,Erebidae,\n
             Poaceae,Erebidae,\n")

df什么是条件
？对于那些lepsp
为空且有plantfam
和lepfam
名称与之关联的人！如果您有时间的话，我想了解一下match
在这里的工作方式。据我所知，禾本科在unique（plantfam）
中排名第二。在第3行和第4行中，它被认为是第2行和第1行-这是因为前面的groupby（lepfam）
？也许我误解了

小组成员？谢谢你的帮助。@LukeC是的，因为我先按lepfam分组，所以该特定组中的禾本科植物的unique（plantfam）将始终为2。@P Lapointe收到了，这很有意义-谢谢你的澄清！

library(dplyr)
df %>%
  group_by(lepfam) %>%
  mutate(lepfam_number= match(plantfam, unique(plantfam)),
         lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="",
               paste0(lepfam,"_morphosp",lepfam_number),
               lepsp)
  )

                     plantfam      lepfam               lepsp lepfam_number
                        <chr>       <chr>               <chr>         <int>
1                  Asteraceae Geometridae             Eois sp             1
2                  Asteraceae    Erebidae  Erebidae_morphosp1             1
3                     Poaceae    Erebidae  Erebidae_morphosp2             2
4                     Poaceae   Noctuidae Noctuidae_morphosp1             1
5                  Asteraceae  Saturnidae      Polyphemous sp             1
6               Melastomaceae   Noctuidae Noctuidae_morphosp2             2
7                  Asteraceae                                             1
8               Melastomaceae                                             2
9                               Noctuidae                                 3
10                               Erebidae                                 3
11                    Poaceae    Erebidae  Erebidae_morphosp2             2

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
                 text = "
plantfam,lepfam,lepsp\n
             Asteraceae,Geometridae,Eois sp\n
             Asteraceae,Erebidae,\n
             Poaceae,Erebidae,\n
             Poaceae,Noctuidae,\n
             Asteraceae,Saturnidae,Polyphemous sp\n
             Melastomaceae,Noctuidae,\n
             Asteraceae,,\n
             Melastomaceae,,\n
             ,Noctuidae,\n
             ,Erebidae,\n
             Poaceae,Erebidae,\n")