R data.table引用赋值修改了错误的对象

R data.table引用赋值修改了错误的对象,r,data.table,R,Data.table,在对data.table中的列进行分组修改时,我遇到一些意外行为: # creating a data.frame data <- data.frame(sequence = rep(c("A","B","C","D"), c(2,3,3,2)), trim = 0, random_value = NA) data[c(1:4, 10), "trim"] <- 1 # copying data to data_temp data_temp <- data # assigni

在对data.table中的列进行分组修改时,我遇到一些意外行为:

# creating a data.frame
data <- data.frame(sequence = rep(c("A","B","C","D"), c(2,3,3,2)), trim = 0, random_value = NA)
data[c(1:4, 10), "trim"] <- 1

# copying data to data_temp
data_temp <- data

# assigning some random value to data_temp so that it should no longer be a
# copy of "data"
data_temp[1, "random_value"] <- rnorm(1)

# converting data_temp to data.table
setDT(data_temp)

# expanding trim parameter to group and subsetting
data_temp <- data_temp[, trim := sum(trim), by = sequence][trim == 0]
因此,“trim”变量的引用赋值也发生在原始data.frame中

出于兼容性原因,我正在使用data.table_1.11.4和R版本3.4.3


该错误是使用旧版本造成的还是我做错了什么/我需要更改代码以避免该错误?

正如@Roland在对原始问题的评论中所指出的,有必要使用“copy()”函数显式复制data.table中的对象。否则data.table将不会将复制的对象视为不同的对象,并将修改两个对象中具有相同名称的列。正如@Imo所检查的,只有在两个data.frames中的一个中更改而不是通过引用更改的列(例如示例中的“random_value”)才被实际复制/取消链接

使用copy()函数可以轻松解决此问题:

# creating a data.frame
data <- data.frame(sequence = rep(c("A","B","C","D"), c(2,3,3,2)), trim = 0, random_value = NA)
data[c(1:4, 10), "trim"] <- 1

# copying data to data_temp explicitly
data_temp <- copy(data)

# assigning some random value to data_temp so that it should no longer be a
# copy of "data" - if the copy() function isn't used, that just unlinks the 
# "random_value" column, but not the others
data_temp[1, "random_value"] <- rnorm(1)

# converting data_temp to data.table
setDT(data_temp)

# expanding trim parameter to group and subsetting
data_temp <- data_temp[, trim := sum(trim), by = sequence][trim == 0]
#创建data.frame

数据读取
帮助(“复制”)
。啊,谢谢。很高兴知道,如果我复制的对象实际上不是data.table对象而是data.frames,那么也有必要使用copy(),其中只有一个将在以后成为data.table。@Roland我很惊讶地看到
data_temp[1,“random_value”]@David。不清楚这是一个重复的问题。尽管“在执行任何操作之前创建副本”的建议可以解决这两个问题,但是
->
的复制行为对于data.frame和data.table对象是不同的。通过使用data.framea重复matt dowle的示例并检查向量的内存位置,可以看到这一点。这将更准确地反映上述情况。
# creating a data.frame
data <- data.frame(sequence = rep(c("A","B","C","D"), c(2,3,3,2)), trim = 0, random_value = NA)
data[c(1:4, 10), "trim"] <- 1

# copying data to data_temp explicitly
data_temp <- copy(data)

# assigning some random value to data_temp so that it should no longer be a
# copy of "data" - if the copy() function isn't used, that just unlinks the 
# "random_value" column, but not the others
data_temp[1, "random_value"] <- rnorm(1)

# converting data_temp to data.table
setDT(data_temp)

# expanding trim parameter to group and subsetting
data_temp <- data_temp[, trim := sum(trim), by = sequence][trim == 0]