基于条件从R中的data.table中删除列
如何根据R中data.table中的值删除列 如果我有一个data.table行基于条件从R中的data.table中删除列,r,data.table,R,Data.table,如何根据R中data.table中的值删除列 如果我有一个data.table行 dt = data.table("col1" = "a", "col2" = "b", "col3" = "c", "col4" = 'd', "col5" = "e", "col6" = 9, "col7
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
前5列为分类列,6-10列为数字列。数字列的所有行都会重复这些数字
我有两个疑问
cols_chosen = c("col6", "col7","col8","col9","col10")
condition = c(FALSE, dt[, lapply(.SD, function(x) sum(x)< 1), .SDcols = cols_chosen])
dt[, which(condition) := NULL]
我从先前的回答中摘取了上述陈述
dt=data.table(“col1”=“a”,“col2”=“b”,“col3”=“c”,
“col4”='d',“col5”='e”,“col6”=9,“col7”=0,“col8”=7,
“col9”=0,“col10”=99)
not0=函数(x)是.数字(x)&&!anyNA(x)&全部(x!=0)
dt[(
##您的分类列
col1,col2,col3,col4,col5,
##从非0数字列粘贴的新列
new=as.numeric(粘贴0(未列出(.SD),折叠=”)
),
##这将筛选要在.SD列子集中提供的列
.SDcols=not0,
##我们按每一行分组,这样它将处理多行的输入
by=(行=序号(nrow(dt)))
][,row:=NULL##这将删除额外的分组列
][]这张照片
#col1 col2 col3 col4 col5全新
#1:ABCDE9799
或者,如果要就地更新现有表
is0=函数(x)是.numeric(x)&&!anyNA(x)和&all(x==0)
##删除包含0的列
dt[,其中(sapply(dt,is0)):=NULL]
##添加新列
dt[,new:=as.numeric(
粘贴0(未列出(.SD),折叠=)
),.SDcols=is.numeric,by=(row=seq_len(nrow(dt)))
][]
#col1 col2 col3 col4 col5 col6 col8 col10全新
#1:a b c d e 9 7 99 9799
库(data.table)
图书馆(dplyr)
图书馆(tidyr)
dt=数据表(“col1”=“a”,“col2”=“b”,“col3”=“c”,
“col4”='d',“col5”='e”,“col6”=9,“col7”=0,“col8”=7,
“col9”=0,“col10”=99)
##哪些行包含零?
零变量%
dplyr::如果(~max(.x)==0)%>%选择_
colnames()
##哪一行包含非零数值变量?
数值变量%
dplyr::select(-all_of(零变量))%>%
dplyr::如果(是数值)%>%,请选择_%
colnames()
##创建新表
崩溃的_dt%
dplyr::select(全部(数值变量))%>%###仅选择非零数值变量
全部变异(如字符)%>%
联合(col=“collapsed_var”,sep=”“)##将它们联合到新的var“collapsed_var”
##将折叠的var重新连接到原始表
dt%>%
dplyr::如果(is.character)%>%##仅字符变量,则选择_
cbind(折叠的)##绑定折叠的#
谢谢您提供的解决方案,但我如何将其分为两个步骤?第一部分只删除0,第二部分创建新列?代码处理的是我提供的示例数据,与您提到的完全相同,但是当我在我的数据集上尝试时,dt调用还将0列与“new”列中的非零列粘贴在一起,可能您有一个整数列。与示例dataset?中不同,我更新了not0
函数,它现在应该也适用于整型字段和多行,请重试@user3612324@user3612324我添加了另一个完全符合您要求的示例,首先删除0列,然后创建新列。感谢您提供的解决方案。当数据位于数据帧中时,这似乎非常有效。但是,我很难让它与data.table一起工作
Error in which(condition) : argument to 'which' is not logical
dt <- data.frame("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
dt <- dt[,dt[1,] != 0]
col1 col2 col3 col4 col5 col6 col8 col10
1 a b c d e 9 7 99
numTag <- unlist(lapply(X = dt[1,], FUN = is.numeric))
dt$new_col <- rep(as.numeric(paste(as.character(dt[1,numTag]), collapse = '', sep = '')), nrow(dt))
col1 col2 col3 col4 col5 col6 col8 col10 new_col
1 a b c d e 9 7 99 9799
numTag <- unlist(lapply(X = dt[1,], FUN = is.numeric))
numTag <- numTag & (dt[1,] != 0)
dt$new_col <- rep(as.numeric(paste(as.character(dt[1,numTag]), collapse = '', sep = '')), nrow(dt))
dt
col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 new_col
1 a b c d e 9 0 7 0 99 9799
library(data.table)
library(dplyr)
library(tidyr)
dt = data.table("col1" = "a", "col2" = "b", "col3" = "c",
"col4" = 'd', "col5" = "e", "col6" = 9, "col7" = 0, "col8" = 7,
"col9" = 0, "col10" = 99)
## which rows contain zeros?
zero_vars <- dt %>%
dplyr::select_if(~max(.x) == 0) %>%
colnames()
## which row contains non-zero numeric vars?
numeric_vars <- dt %>%
dplyr::select(-all_of(zero_vars)) %>%
dplyr::select_if(is.numeric) %>%
colnames()
## creat new table
collapsed_dt <-
dt %>%
dplyr::select(all_of(numeric_vars)) %>% ## select only non-zero numeric vars
mutate_all(as.character) %>%
unite( col = "collapsed_var", sep = "") ## unite them to new var 'collapsed_var'
## re-join the collapsed var to the original table
dt %>%
dplyr::select_if(is.character) %>% ## only character variables
cbind(collapsed_dt) ## bind the collapsed_dt