R 获取最后两列中的所有数字数据,其位置因行而异
我有一个样本数据如下:R 获取最后两列中的所有数字数据,其位置因行而异,r,data.table,R,Data.table,我有一个样本数据如下: dt1 <- setDT(structure(list(V1 = c(301L, 301L, 301L, 301L, 301L), V2 = 1:5, V3 = c(61950L, 61951L, 61953L, 155220L, 155218L), V4 = c("i", "you", "you", "we", "they"), V5 = c("believe", "think", "are", "laugh", "smile"),
dt1 <- setDT(structure(list(V1 = c(301L, 301L, 301L, 301L, 301L), V2 = 1:5,
V3 = c(61950L, 61951L, 61953L, 155220L, 155218L), V4 = c("i",
"you", "you", "we", "they"), V5 = c("believe", "think", "are",
"laugh", "smile"), V6 = c("we", "they", "okay", "490", "490"
), V7 = c("can", "500", "with", "31", "31"), V8 = c("use",
"32", "that", "", ""), V9 = c("datatable", "", "500", "",
""), V10 = c("always", "", "32", "", ""), V11 = c("500",
"", "", "", ""), V12 = c("32", "", "", "", "")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",
"V12"), row.names = c(NA, -5L), class = "data.frame"))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1: 301 1 61950 i believe we can use datatable always 500 32
2: 301 2 61951 you think they 500 32
3: 301 3 61953 you are okay with that 500 32
4: 301 4 155220 we laugh 490 31
5: 301 5 155218 they smile 490 31
机制:
- a) 示例数据中的列V1、V2和V3始终是数字,并且将
在样本输出中保持不变 - b) 示例中的最后两列 数据总是数字,但每行的最后两列的位置不同:在上面的示例数据中,第1行的最后两列是V11和V12,第2行的最后两列是V7和V8
- c) 在中的前三个数字列和最后两列之间 示例数据,有文本数据:例如,在第1行的第4列中:V10 都是文本,在第2行中,V4:V6始终是文本
- d) 没有任何 数据是空白的
- e) 样本输出必须具有与样本数据相同的前三列;示例输出中的newcol1仅合并该行的文本列
- f) 样本输出中的newcol2和newcol3始终是每行的最后两个数值(请再次注意,列的位置在每个行中都不同) (行)
rowid_vars = c("V1","V2","V3")
melt(dt1, id=rowid_vars)[value!="", .(
nc1 = paste(value[-(.N-1:0)], collapse=" "),
nc2 = as.integer(value[.N-1]),
nc3 = as.integer(value[.N])
), by=rowid_vars]
V1 V2 V3 nc1 nc2 nc3
1: 301 1 61950 i believe we can use datatable always 500 32
2: 301 2 61951 you think they 500 32
3: 301 3 61953 you are okay with that 500 32
4: 301 4 155220 we laugh 490 31
5: 301 5 155218 they smile 490 31
我想你可以通过某种方式读入数据来避免这个问题,但我不知道怎么做。你可以这样做
rowid_vars = c("V1","V2","V3")
melt(dt1, id=rowid_vars)[value!="", .(
nc1 = paste(value[-(.N-1:0)], collapse=" "),
nc2 = as.integer(value[.N-1]),
nc3 = as.integer(value[.N])
), by=rowid_vars]
V1 V2 V3 nc1 nc2 nc3
1: 301 1 61950 i believe we can use datatable always 500 32
2: 301 2 61951 you think they 500 32
3: 301 3 61953 you are okay with that 500 32
4: 301 4 155220 we laugh 490 31
5: 301 5 155218 they smile 490 31
我想你可以通过某种方式读入数据来避免这个问题,但我不知道怎么做