R 根据数据集中的ID填写N/As
我在R中的数据集看起来像下面的一个,我有多个ID和年份,但不总是街道、州和国家的信息R 根据数据集中的ID填写N/As,r,na,R,Na,我在R中的数据集看起来像下面的一个,我有多个ID和年份,但不总是街道、州和国家的信息 ID Year Street State Country 1 2000 123 Main St CA USA 1 2001 N/A N/A N/A 1 2002 N/A N/A N/A ... 1 2017 N/A N/A N/A 2
ID Year Street State Country
1 2000 123 Main St CA USA
1 2001 N/A N/A N/A
1 2002 N/A N/A N/A
...
1 2017 N/A N/A N/A
2 2001 123 Bloom Rd CA USA
2 2002 123 Bloom Rd CA USA
2 2003 N/A N/A N/A
...
2 2017 N/A N/A N/A
...
我的目标是用适当的值(即每个ID对应的值)填写N/As。因此,对于ID“1”,街道N/As下应该有“123 Main Street”,以此类推
谢谢大家! 这里是同时使用data.tbale和dplyr的解决方案
df <- read.table(text = "ID, Year, Street, State, Country
1, 2000, 123 Main St, CA, USA
1, 2001, N/A, N/A, N/A
1, 2002, N/A, N/A, N/A
1, 2017, N/A, N/A, N/A
2, 2001, 123 Bloom Rd, CA, USA
2, 2002, 123 Bloom Rd, CA, USA
2, 2003, N/A, N/A, N/A
2, 2017, N/A, N/A, N/A",header = T,sep = ",")
library(dplyr)
df %>%
group_by(ID) %>%
mutate_at(vars('Street', 'State', 'Country'), funs(.[!is.na(.)][1]))
library(data.table)
df <- setDT(df)
coltochange <- c("Street", "State", "Country")
df[, c(coltochange) := lapply(.SD,function(x){x[!is.na(x)][1]}),.SDcols = coltochange ,by = ID]
这里是同时使用data.tbale和dplyr的解决方案
df <- read.table(text = "ID, Year, Street, State, Country
1, 2000, 123 Main St, CA, USA
1, 2001, N/A, N/A, N/A
1, 2002, N/A, N/A, N/A
1, 2017, N/A, N/A, N/A
2, 2001, 123 Bloom Rd, CA, USA
2, 2002, 123 Bloom Rd, CA, USA
2, 2003, N/A, N/A, N/A
2, 2017, N/A, N/A, N/A",header = T,sep = ",")
library(dplyr)
df %>%
group_by(ID) %>%
mutate_at(vars('Street', 'State', 'Country'), funs(.[!is.na(.)][1]))
library(data.table)
df <- setDT(df)
coltochange <- c("Street", "State", "Country")
df[, c(coltochange) := lapply(.SD,function(x){x[!is.na(x)][1]}),.SDcols = coltochange ,by = ID]
尝试
tidyverse
方法:
df <- read_table("ID Year Street State Country #importing the data
1 2000 123_Main_St CA USA
1 2001 N/A N/A N/A
1 2002 N/A N/A N/A
1 2017 N/A N/A N/A
2 2001 123_Bloom_Rd CA USA
2 2002 123_Bloom_Rd CA USA
2 2003 N/A N/A N/A
2 2017 N/A N/A N/A") %>%
separate("ID Year Street State Country", c("ID", "Year", "Street", "State", "Country"), sep = " ") %>% # cleaning the columns
group_by(ID) %>% # grouping by vars with same ID(Information)
mutate_at(vars('Street', 'State', 'Country'), funs(.[.!= "N/A"][1])) # replace NA with information of same ID without NA (remember NA is still a string from import)
df%
单独(“ID年-街道-州-国家”,c(“ID”,“年”,“街道”,“州”,“国家”),sep=“”)%>%#清洁立柱
分组依据(ID)%>%#按具有相同ID的变量分组(信息)
在(vars('Street'、'State'、'Country')、funs(..!=“N/A”][1])处进行变异#用不带NA的相同ID的信息替换NA(记住NA仍然是导入的字符串)
尝试tidyverse
方法:
df <- read_table("ID Year Street State Country #importing the data
1 2000 123_Main_St CA USA
1 2001 N/A N/A N/A
1 2002 N/A N/A N/A
1 2017 N/A N/A N/A
2 2001 123_Bloom_Rd CA USA
2 2002 123_Bloom_Rd CA USA
2 2003 N/A N/A N/A
2 2017 N/A N/A N/A") %>%
separate("ID Year Street State Country", c("ID", "Year", "Street", "State", "Country"), sep = " ") %>% # cleaning the columns
group_by(ID) %>% # grouping by vars with same ID(Information)
mutate_at(vars('Street', 'State', 'Country'), funs(.[.!= "N/A"][1])) # replace NA with information of same ID without NA (remember NA is still a string from import)
df%
单独(“ID年-街道-州-国家”,c(“ID”,“年”,“街道”,“州”,“国家”),sep=“”)%>%#清洁立柱
分组依据(ID)%>%#按具有相同ID的变量分组(信息)
在(vars('Street'、'State'、'Country')、funs(..!=“N/A”][1])处进行变异#用不带NA的相同ID的信息替换NA(记住NA仍然是导入的字符串)
Trylibrary(dplyr);df1%%>%group_by(ID)%%>%mutate_at(vars('Street'、'State'、'Country')、funs(.[!='N/A'][1]))
谢谢。不幸的是,它只适用于某些观察,而不适用于整个数据集。可能是,您有时将NA作为字符串导入,有时作为逻辑库(dplyr)导入;df1%%>%group_by(ID)%%>%mutate_at(vars('Street'、'State'、'Country')、funs(.[!='N/A'][1]))谢谢。不幸的是,它只对某些观察有效,而对整个数据集无效。可能是,您有时将NA作为字符串导入,有时作为逻辑数据导入?