R 有条件地用其他行的值替换NA
我得到了一个大数据集,其中一个变量中有一组相对较大的缺失变量值。但是,由于我知道变量取决于时间和空间方面,我可以通过从另一行获取具有精确匹配的时间和空间值的值来轻松地估算缺失值。假设生成的数据如下:R 有条件地用其他行的值替换NA,r,tidyverse,R,Tidyverse,我得到了一个大数据集,其中一个变量中有一组相对较大的缺失变量值。但是,由于我知道变量取决于时间和空间方面,我可以通过从另一行获取具有精确匹配的时间和空间值的值来轻松地估算缺失值。假设生成的数据如下: temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "
temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)
df <- as.data.frame(cbind(temporal, spatial, value))
在本例中,我想用另一行中的值替换值==NA
,该行在空间
和时间
上具有匹配值
因此,最终结果应如下所示:
temporal spatial value
1 Monday North 1
2 Monday South 2
3 Tuesday North 3
4 Tuesday South 4
5 Wednesday North 5
6 Wednesday South 6
7 Thursday North 7
8 Thursday South 8
9 Friday North 9
10 Friday South 10
11 Monday North 1
12 Monday South 2
13 Tuesday North 3
14 Tuesday South 4
15 Wednesday North 5
16 Wednesday South 6
17 Thursday North 7
18 Thursday South 8
19 Friday North 9
20 Friday South 10
我尝试使用tidyverse
中的group\u by
功能来实现这一点:
library(tidyverse)
df <- df %>%
group_by(temporal, spatial) %>%
mutate(value, unique(value[is.na(value)]))
我是否以正确的方式处理这个问题?如果是的话,为什么我的代码不能(我相信)正常工作?如果没有,什么方法是合适的
谢谢!:) 这里有一个dplyr
方法。我们按时间
和空间
进行分组,然后按时间
、空间
和值
进行排列,因为NA值将自动置于任何非NA值之下。然后我们使用mutate
根据value
第一行中的数字创建value
library(dplyr)
df %>%
group_by(temporal, spatial) %>%
arrange(temporal, spatial, value) %>%
mutate(value = value[1])
使用tidyr::fill
的更简洁的方法保留了行的结构:
library(tidyverse)
df %>%
group_by(temporal, spatial) %>%
fill(value, .direction = "downup")
# A tibble: 20 x 3
# Groups: temporal, spatial [10]
temporal spatial value
<chr> <chr> <chr>
1 Monday North 1
2 Monday South 2
3 Tuesday North 3
4 Tuesday South 4
5 Wednesday North 5
6 Wednesday South 6
7 Thursday North 7
8 Thursday South 8
9 Friday North 9
10 Friday South 10
11 Monday North 1
12 Monday South 2
13 Tuesday North 3
14 Tuesday South 4
15 Wednesday North 5
16 Wednesday South 6
17 Thursday North 7
18 Thursday South 8
19 Friday North 9
20 Friday South 10
库(tidyverse)
df%>%
分组依据(时间、空间)%>%
填充(值,.direction=“向下”)
#一个tibble:20x3
#组:时间、空间[10]
时空值
1星期一北1
2星期一南2
3星期二北3
4星期二南4
星期三北5
6星期三南6
星期四北7
8星期四南8
9星期五北9
星期五南10
11星期一北1
12星期一南2
13星期二北3
14星期二南4
15星期三北5
16星期三南6
17星期四北7
18星期四南8
19星期五北9
20星期五南10
这里有一个dplyr
方法。我们按时间
和空间
进行分组,然后按时间
、空间
和值
进行排列,因为NA值将自动置于任何非NA值之下。然后我们使用mutate
根据value
第一行中的数字创建value
library(dplyr)
df %>%
group_by(temporal, spatial) %>%
arrange(temporal, spatial, value) %>%
mutate(value = value[1])
使用tidyr::fill
的更简洁的方法保留了行的结构:
library(tidyverse)
df %>%
group_by(temporal, spatial) %>%
fill(value, .direction = "downup")
# A tibble: 20 x 3
# Groups: temporal, spatial [10]
temporal spatial value
<chr> <chr> <chr>
1 Monday North 1
2 Monday South 2
3 Tuesday North 3
4 Tuesday South 4
5 Wednesday North 5
6 Wednesday South 6
7 Thursday North 7
8 Thursday South 8
9 Friday North 9
10 Friday South 10
11 Monday North 1
12 Monday South 2
13 Tuesday North 3
14 Tuesday South 4
15 Wednesday North 5
16 Wednesday South 6
17 Thursday North 7
18 Thursday South 8
19 Friday North 9
20 Friday South 10
库(tidyverse)
df%>%
分组依据(时间、空间)%>%
填充(值,.direction=“向下”)
#一个tibble:20x3
#组:时间、空间[10]
时空值
1星期一北1
2星期一南2
3星期二北3
4星期二南4
星期三北5
6星期三南6
星期四北7
8星期四南8
9星期五北9
星期五南10
11星期一北1
12星期一南2
13星期二北3
14星期二南4
15星期三北5
16星期三南6
17星期四北7
18星期四南8
19星期五北9
20星期五南10
您的mutate将不起作用,因为您没有为变量赋值。您的mutate()
应该是这样的mutate(value=unique(value[is.na(value)])
。虽然这不是我的方法。我在下面做的是创建一个包含不同非NA值的查找表,然后将它们加入原始数据集。valuedis应该是您想要的值
temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)
df <- as.data.frame(cbind(temporal, spatial, value))
library(dplyr)
dfdis <- df %>%
filter(!is.na(value)) %>%
distinct(temporal,spatial,value) %>%
rename(valuedis = value)
df2 <- left_join(df,dfdis, by = c("temporal","spatial"))
temporal您的mutate将不起作用,因为您没有为变量赋值。您的mutate()
应该是这样的mutate(value=unique(value[is.na(value)])
。虽然这不是我的方法。我在下面做的是创建一个包含不同非NA值的查找表,然后将它们加入原始数据集。valuedis应该是您想要的值
temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)
df <- as.data.frame(cbind(temporal, spatial, value))
library(dplyr)
dfdis <- df %>%
filter(!is.na(value)) %>%
distinct(temporal,spatial,value) %>%
rename(valuedis = value)
df2 <- left_join(df,dfdis, by = c("temporal","spatial"))
时态