在data.r表中按名称选择非连续列_R_Data.table

在data.r表中按名称选择非连续列

在data.r表中按名称选择非连续列,r,data.table,R,Data.table,我的磁盘上有数据库，我想使用：偶尔在数据表中选择多个列。表使用列名以前的答案只包括使用索引进行列选择，这对我的情况来说是不可取的示例如下所示： library(gapminder) data(gapminder) setDT(gapminder) names(gapminder) # [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap" # I would like to select co

我的磁盘上有数据库，我想使用

：

偶尔在

数据表中选择多个列。表

使用列名

以前的答案只包括使用索引进行列选择，这对我的情况来说是不可取的

示例如下所示：

library(gapminder)
data(gapminder)
setDT(gapminder)

names(gapminder) # [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"

# I would like to select columns from `country` to `year` and pop

gapminder[,country:year] # this one works



gapminder[,country:year + pop] # doesn't work
gapminder[,c(country:year,pop)] # doesn't work either

gapminder[,.SD, .SDcols = c(country:year,pop)] # doesn't work

这件事让我抓狂。如果有任何建议，我将不胜感激

我不确定在

data.table

中是否真的有一个简单的解决方案，但也许您可以

cbind

使用单个列名的列范围

library(data.table)
cbind(gapminder[,country:year], gapminder[, 'pop'])

但是，在

dplyr

中可以实现所需的行为

library(dplyr)
gapminder %>% select(country:year, pop)


#       country continent year      pop
#1: Afghanistan      Asia 1952  8425333
#2: Afghanistan      Asia 1957  9240934
#3: Afghanistan      Asia 1962 10267083
#4: Afghanistan      Asia 1967 11537966
#5: Afghanistan      Asia 1972 13079460
#6: Afghanistan      Asia 1977 14880372

另一种选择：

gapminder[, c(.SD, .(pop=pop)), .SDcols=country:year]

或者如果你有更多的专栏

cols <- setNames(c("pop", "lifeExp"), c("pop", "lifeExp"))
gapminder[, c(.SD, mget(cols)), .SDcols=country:year]

更新

在咨询了客户后，特别是本部分

1.2为什么DT[，“region”]返回一个单列data.table而不是向量？见上面的答案。请尝试使用DT$region。或DT[[“区域”]]

1.3为什么DT[区域]为“区域”列返回向量？我想要一个1列数据表。请尝试DT[，（区域）]。（）是list（）的别名，确保返回data.table

我意识到有一个更简单的解决办法

为了使用保留列名的cbind，需要传递两个数据表。列命名为

V4

的问题是，您正在向cbind传递一个向量

但您可以控制data.table是返回向量还是返回1列data.table。以下是您案例中的情况：

newest_gapminder2 <- cbind(gapminder[, country:year], gapminder[, 'pop'])

这是一个稍微好一点，但我想知道的东西，甚至更精简。我考虑在data.table中链接，并希望将其与方法2结合起来。我想感谢

：=NULL

技术

# method 3
## thinking about how data.table works, can the := NULL be chained? 
## spoiler: it can!
## this feels like kind of a hack but...
new_dt3 <-cbind(dt[,col1:col2][, col2:=NULL], dt[,col3:col5])

class(new_dt3)
dim(new_dt3)
new_dt3

这三种技术都具有可比性。不过，我不确定这在多GB数据集上会如何执行。

此方法添加了一个任意名称，即V4表示pop。这让我很犹豫是否要使用它approach@MatthewSon，已在updateQuick修复程序中解决了您的问题，看起来相当简洁，但是否有任何方法可以在data.table中使用多个冒号

：

？我的意思是，就像选择（a:c）和（e:g）.prob不在

内，但您可以编写一个函数来获取列，然后再传递到

.SDcols

newest_gapminder3 <- cbind(gapminder[, country:year], gapminder[, .(pop)])

## create a data table for this example
dt <- data.table("col1"=1:5, "col2"=2:6, "col3"=letters[2:6], "col4"=letters[1:5], "col5"=3:7)
dim(dt)
dt

## the goal is to create a subset of this data frame that contains col1, col3, col4, and col5

# method 1
## subset out a vector and give the column name
col1 <- dt[, col1]

## use cbind on the object and the data table subset
## the object name takes the place of the column name in the table
new_dt <- cbind(col1, dt[, col3:col5])

## check that the result is a data.table
class(new_dt)
dim(new_dt)
new_dt

dt_alt <- cbind(dt[, col1], dt[, col3:col5])

# method 2
## take two different subsets/slices and cbind them 
new_dt2 <- cbind(dt[, col1:col2], dt[, col3:col5])

## take out col2
new_dt2[, col2 := NULL]

class(new_dt2)
dim(new_dt2)
new_dt2

# method 3
## thinking about how data.table works, can the := NULL be chained? 
## spoiler: it can!
## this feels like kind of a hack but...
new_dt3 <-cbind(dt[,col1:col2][, col2:=NULL], dt[,col3:col5])

class(new_dt3)
dim(new_dt3)
new_dt3

gapminder <- cbind(gapminder[, country:year]), gapminder[, pop:gdpPercap][, gdpPercap := NULL])

   user  system elapsed 
   0.00    0.00    0.02