R data.table根据字符串值将一行拆分为多行
我有一个data.table,如下所示:R data.table根据字符串值将一行拆分为多行,r,data.table,R,Data.table,我有一个data.table,如下所示: dt <- data.table("title"=c("First Title", "Second Title", "Third Title", "Fourth Title"), "sha"=c("12345", "2345; 66543; 33423", "22222; 12345678;", "666662345; 444")) 提前谢谢 这里有一种使用tstrsplit的方法应该适合您: library(
dt <- data.table("title"=c("First Title", "Second Title", "Third Title", "Fourth Title"),
"sha"=c("12345", "2345; 66543; 33423", "22222; 12345678;", "666662345; 444"))
提前谢谢 这里有一种使用
tstrsplit
的方法应该适合您:
library(data.table)
dt[, lapply(.SD, function(x) unlist(tstrsplit(x, "; ?"))),
.SDcols = "sha",by = c("title","date")]
title date sha
1: First Title 1/1/2020 12345
2: Second Title 1/2/2020 2345
3: Second Title 1/2/2020 66543
4: Second Title 1/2/2020 33423
5: Third Title 1/3/2020 22222
6: Third Title 1/3/2020 12345678
7: Fourth Title 1/4/2020 666662345
8: Fourth Title 1/4/2020 444
数据
dt <- data.table("title"=c("First Title", "Second Title", "Third Title", "Fourth Title"),
"sha"=c("12345", "2345; 66543; 33423", "22222; 12345678;", "666662345; 444"),
"date" = c("1/1/2020","1/2/2020","1/3/2020","1/4/2020"))
dt这是我的dplyr
解决方案:
dt %>%
group_by(title) %>%
separate_rows(sha, sep = ";") %>%
mutate(sha = as.numeric(sha)) %>%
filter(!is.na(sha))
我应该给你这个:
# A tibble: 8 x 2
# Groups: title [4]
title sha
<chr> <dbl>
1 First Title 12345
2 Second Title 2345
3 Second Title 66543
4 Second Title 33423
5 Third Title 22222
6 Third Title 12345678
7 Fourth Title 666662345
8 Fourth Title 444
#一个tible:8 x 2
#分组:标题[4]
标题沙
1第一标题12345
2第二标题2345
3第二标题66543
4第二标题33423
5第三标题22222
6第三标题12345678
7第四标题666 2345
8第四标题444
下面是另一个使用数据的解决方案。表
:
dt[, .(sha = unlist(tstrsplit(sha, ";", type.convert = TRUE))), by = "title"]
# title sha
# 1: First Title 12345
# 2: Second Title 2345
# 3: Second Title 66543
# 4: Second Title 33423
# 5: Third Title 22222
# 6: Third Title 12345678
# 7: Fourth Title 666662345
# 8: Fourth Title 444
在这个操作之后,是否可以保留其他列而不仅仅是title和sha?例如,我还有一个日期column@Blind0ne一种更简单的方法:dt%>%分开的行(sha,sep=“;”)%>%na\u如果(“”)%>%drop\u na()
@Neel谢谢!实际上,您不需要drop\u na()
,但需要对空向量进行过滤<代码>分隔_行(dt,sha)%>%过滤器(sha!=“”)
。
dt[, .(sha = unlist(tstrsplit(sha, ";", type.convert = TRUE))), by = "title"]
# title sha
# 1: First Title 12345
# 2: Second Title 2345
# 3: Second Title 66543
# 4: Second Title 33423
# 5: Third Title 22222
# 6: Third Title 12345678
# 7: Fourth Title 666662345
# 8: Fourth Title 444