使用dplyr更新其他因子级别中给定因子匹配的空白级别
我有这样一个数据帧:使用dplyr更新其他因子级别中给定因子匹配的空白级别,r,R,我有这样一个数据帧: df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, text = " plantfam,lepfam,lepsp\n Asteraceae,Geometridae,Eois sp\n Asteraceae,Erebidae,\n Poaceae,Erebidae,\n Poace
df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE,
text = "
plantfam,lepfam,lepsp\n
Asteraceae,Geometridae,Eois sp\n
Asteraceae,Erebidae,\n
Poaceae,Erebidae,\n
Poaceae,Noctuidae,\n
Asteraceae,Saturnidae,Polyphemous sp\n
Melastomaceae,Noctuidae,\n
Asteraceae,,\n
Melastomaceae,,\n
,Noctuidae,\n
,Erebidae,\n
Poaceae, Erebidae,\n")
我试过:
condition <- quote(lepsp == "" & plantfam != "" & lepfam != "")
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>%
mutate(lepsp=
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam))))
subset2 <- df %>% filter(condition) %>% setdiff(df, .)
union(subset1, subset2) %>% arrange(lepsp)
我认为问题可能只是在你的
df
中,最后一行在Erebidae之前有一个空格,这导致R认为它与另一行不同
当我完成我的答案时,我发现了这一点。这是我如何做你想做的事。我在mutate
粘贴之前引入了一个组号lepfam\u number
library(dplyr)
df %>%
group_by(lepfam) %>%
mutate(lepfam_number= match(plantfam, unique(plantfam)),
lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="",
paste0(lepfam,"_morphosp",lepfam_number),
lepsp)
)
plantfam lepfam lepsp lepfam_number
<chr> <chr> <chr> <int>
1 Asteraceae Geometridae Eois sp 1
2 Asteraceae Erebidae Erebidae_morphosp1 1
3 Poaceae Erebidae Erebidae_morphosp2 2
4 Poaceae Noctuidae Noctuidae_morphosp1 1
5 Asteraceae Saturnidae Polyphemous sp 1
6 Melastomaceae Noctuidae Noctuidae_morphosp2 2
7 Asteraceae 1
8 Melastomaceae 2
9 Noctuidae 3
10 Erebidae 3
11 Poaceae Erebidae Erebidae_morphosp2 2
库(dplyr)
df%>%
组别(lepfam)%>%
突变(lepfam_编号=匹配(plantfam,unique(plantfam)),
lepsp=ifelse(lepsp=“”&lepfam!“”&trimws(plantfam)!“”,
粘贴0(lepfam,“_morphosp”,lepfam_编号),
lepsp)
)
plantfam lepfam lepsp lepfam_编号
1菊科几何科Eois sp 1
2菊科菊科菊科菊科菊科菊科菊科菊科菊科菊科1
3禾本科菊科菊科菊科菊科2
4禾本科夜蛾科夜蛾科1
5菊科土星科水龙鱼属1
6野牡丹科夜蛾科夜蛾科2
7菊科1
8野牡丹科2
9夜蛾科3
10艾瑞毕科3
11禾本科菊科菊科菊科菊科2
数据
df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE,
text = "
plantfam,lepfam,lepsp\n
Asteraceae,Geometridae,Eois sp\n
Asteraceae,Erebidae,\n
Poaceae,Erebidae,\n
Poaceae,Noctuidae,\n
Asteraceae,Saturnidae,Polyphemous sp\n
Melastomaceae,Noctuidae,\n
Asteraceae,,\n
Melastomaceae,,\n
,Noctuidae,\n
,Erebidae,\n
Poaceae,Erebidae,\n")
df什么是条件
?对于那些lepsp
为空且有plantfam
和lepfam
名称与之关联的人!如果您有时间的话,我想了解一下match
在这里的工作方式。据我所知,禾本科在unique(plantfam)
中排名第二。在第3行和第4行中,它被认为是第2行和第1行-这是因为前面的groupby(lepfam)
?也许我误解了
小组成员?谢谢你的帮助。@LukeC是的,因为我先按lepfam分组,所以该特定组中的禾本科植物的unique(plantfam)将始终为2。@P Lapointe收到了,这很有意义-谢谢你的澄清!
library(dplyr)
df %>%
group_by(lepfam) %>%
mutate(lepfam_number= match(plantfam, unique(plantfam)),
lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="",
paste0(lepfam,"_morphosp",lepfam_number),
lepsp)
)
plantfam lepfam lepsp lepfam_number
<chr> <chr> <chr> <int>
1 Asteraceae Geometridae Eois sp 1
2 Asteraceae Erebidae Erebidae_morphosp1 1
3 Poaceae Erebidae Erebidae_morphosp2 2
4 Poaceae Noctuidae Noctuidae_morphosp1 1
5 Asteraceae Saturnidae Polyphemous sp 1
6 Melastomaceae Noctuidae Noctuidae_morphosp2 2
7 Asteraceae 1
8 Melastomaceae 2
9 Noctuidae 3
10 Erebidae 3
11 Poaceae Erebidae Erebidae_morphosp2 2
df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE,
text = "
plantfam,lepfam,lepsp\n
Asteraceae,Geometridae,Eois sp\n
Asteraceae,Erebidae,\n
Poaceae,Erebidae,\n
Poaceae,Noctuidae,\n
Asteraceae,Saturnidae,Polyphemous sp\n
Melastomaceae,Noctuidae,\n
Asteraceae,,\n
Melastomaceae,,\n
,Noctuidae,\n
,Erebidae,\n
Poaceae,Erebidae,\n")