用R中另一列的值替换一列的多个观测值
我试图用另外两列中的值替换两列中的值。这是一个相当基本的问题,有人问过用R中另一列的值替换一列的多个观测值,r,join,merge,dplyr,R,Join,Merge,Dplyr,我试图用另外两列中的值替换两列中的值。这是一个相当基本的问题,有人问过python,但我使用的是R 我有一个类似这样的df(只在更大的范围内[>20000]): 对于63只松鼠,我需要替换它们的locx和locy值 我通常用以下代码替换值: library(dplyr) df <- df %>% mutate(locx = ifelse (squirrel_id=="6391", "12.5", locx), locy = ifelse (squir
python
,但我使用的是R
我有一个类似这样的df
(只在更大的范围内[>20000]):
对于63只松鼠,我需要替换它们的locx
和locy
值
我通常用以下代码替换值:
library(dplyr)
df <- df %>%
mutate(locx = ifelse (squirrel_id=="6391", "12.5", locx),
locy = ifelse (squirrel_id=="6391", "15.5", locy),
locx = ifelse (squirrel_id=="8443", "2.5", locx),
locy = ifelse (squirrel_id=="8443", "80", locy)) #etc for 63 squirrels
但这会产生额外的126行代码,我怀疑有一种更简单的方法可以做到这一点
我确实在一个单独的df
中有所有新的locx
和locy
值,但我不知道如何通过squirrel\u id
连接这两个数据帧
df
带有需要替换旧df
中的值:
squirrel_id new_locx new_locy
6391 12.5 15.5
8443 2.5 80
6025 -55.0 0.0
如何更有效地执行此操作?您可以
左键连接两个数据帧,然后使用if\u else
语句获取右键locx
和locy
。试用:
library(dplyr)
df %>% left_join(df2, by = "squirrel_id") %>%
mutate(locx = if_else(is.na(new_locx), locx, new_locx), # as suggested by @echasnovski, we can also use locx = coalesce(new_locx, locx)
locy = if_else(is.na(new_locy), locy, new_locy)) %>% # or locy = coalesce(new_locy, locy)
select(-new_locx, -new_locy)
# output
squirrel_id locx locy dist
1 6391 12.5 15.5 50.0
2 6391 12.5 15.5 20.0
3 6391 12.5 15.5 15.5
4 8443 2.5 80.0 800.0
5 6025 -55.0 0.0 0.0
6 5000 18.5 18.5 10.0 # squirrel_id 5000 was created for an example of id
# present if df but not in df2
数据
df <- structure(list(squirrel_id = c(6391L, 6391L, 6391L, 8443L, 6025L,
5000L), locx = c(17.5, 17.5, 17.5, 20.5, -5, 18.5), locy = c(10,
10, 10, 1, -0.5, 12.5), dist = c(50, 20, 15.5, 800, 0, 10)), class = "data.frame", row.names = c(NA,
-6L))
df2 <- structure(list(squirrel_id = c(6391L, 8443L, 6025L), new_locx = c(12.5,
2.5, -55), new_locy = c(15.5, 80, 0)), class = "data.frame", row.names = c(NA,
-3L))
df使用@ANG的数据,这里有一个data.table
解决方案。它通过引用连接并更新原始的df
library(data.table)
setDT(df)
setDT(df2)
df[df2, on = c('squirrel_id'), `:=` (locx = new_locx, locy = new_locy) ]
df
squirrel_id locx locy dist
1: 6391 12.5 15.5 50.0
2: 6391 12.5 15.5 20.0
3: 6391 12.5 15.5 15.5
4: 8443 2.5 80.0 800.0
5: 6025 -55.0 0.0 0.0
6: 5000 18.5 12.5 10.0
另见:
请注意,您可以使用coalesce(x,y)
而不是if_else(is.na(x),x,y)
。感谢您指出@echasnovski,我将编辑我的帖子
df <- structure(list(squirrel_id = c(6391L, 6391L, 6391L, 8443L, 6025L,
5000L), locx = c(17.5, 17.5, 17.5, 20.5, -5, 18.5), locy = c(10,
10, 10, 1, -0.5, 12.5), dist = c(50, 20, 15.5, 800, 0, 10)), class = "data.frame", row.names = c(NA,
-6L))
df2 <- structure(list(squirrel_id = c(6391L, 8443L, 6025L), new_locx = c(12.5,
2.5, -55), new_locy = c(15.5, 80, 0)), class = "data.frame", row.names = c(NA,
-3L))
library(data.table)
setDT(df)
setDT(df2)
df[df2, on = c('squirrel_id'), `:=` (locx = new_locx, locy = new_locy) ]
df
squirrel_id locx locy dist
1: 6391 12.5 15.5 50.0
2: 6391 12.5 15.5 20.0
3: 6391 12.5 15.5 15.5
4: 8443 2.5 80.0 800.0
5: 6025 -55.0 0.0 0.0
6: 5000 18.5 12.5 10.0