R 如何使用data.table'；我的s.N？_R_Data.table

R 如何使用data.table'；我的s.N？

R 如何使用data.table'；我的s.N？,r,data.table,R,Data.table,在中，它说“.N也可以用在i中。”我该怎么做例如，我希望下面的代码只保留组中有一个元素的行 > library(data.table) > set.seed(734) > dt <- data.table(x = c(rep("a", 5), rep("b", 3), "c", "d", "e"), y = runif(11)) > dt x y 1: a 0.46431448 2: a 0.57

在中，它说“

.N

也可以用在

中。”我该怎么做

例如，我希望下面的代码只保留组中有一个元素的行

> library(data.table)
> set.seed(734)
> dt <- data.table(x = c(rep("a", 5), rep("b", 3), "c", "d", "e"),
                   y = runif(11))
> dt
    x          y
 1: a 0.46431448
 2: a 0.57148294
 3: a 0.30197960
 4: a 0.06394102
 5: a 0.08793526
 6: b 0.62994539
 7: b 0.64693916
 8: b 0.79671939
 9: c 0.60865117
10: d 0.86025196
11: e 0.21562992

> dt[.N == 1, .(y), by = .(x)]
Empty data.table (0 rows) of 2 cols: x,y

如果与上述示例不同，我将如何在

中使用

.N

来处理

数据。表？
基于.N
的逻辑表达式不在I
中使用。相反，从j
中的表达式中获取行索引（.I
），提取（$V1
）索引并对行进行子集划分
dt[dt[, .I[.N == 1], by = .(x)]$V1]
#   x         y
#1: c 0.6086512
#2: d 0.8602520
#3: e 0.2156299


此外，表达式可用于将.SD
子集（可能较慢）

关于？.N
的用法
.SD、.BY、.N、.I和.GRP是在j.中使用的只读符号。N也可以用在i中
但是，它没有提到什么背景。如果我们只使用i
表达式
dt[.N > 2] # works

或者i
和j
，它可以工作
dt[.N > 2, .(x)]


要了解如何调用函数，请使用verbose=TRUE

dt[.N ==1, .SD, by = .(x), verbose = TRUE]
#i clause present and columns used in by detected, only these subset: x 
#lapply optimization changed j from '.SD' to 'list(y)'
#Old mean optimization is on, left j unchanged.
#Making each group and running j (GForce FALSE) ... 
#  memcpy contiguous groups took 0.000s for 1 groups
#  eval(j) took 0.000s for 1 calls
#0.046s elapsed (0.268s cpu) 
#Empty data.table (0 rows and 2 cols): x,y

dt[dt[, .I[.N == 1], by = .(x), verbose = TRUE]$V1]
#Detected that j uses these columns: <none> 
#Finding groups using forderv ... 0.032s elapsed (0.033s cpu) 
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.033s #elapsed (0.194s cpu) 
#lapply optimization is on, j unchanged as '.I[.N == 1]'
#GForce is on, left j unchanged
#Old mean optimization is on, left j unchanged.
#Making each group and running j (GForce FALSE) ... dogroups: growing from 0 to #2 rows
#dogroups: growing from 2 to 4 rows
#Wrote less rows (3) than allocated (4).

#  memcpy contiguous groups took 0.000s for 5 groups
#  eval(j) took 0.000s for 5 calls
0.046s elapsed (0.273s cpu) 

dt[.N==1.SD，by=.（x），verbose=TRUE]
#i子句present和中使用的列被检测到，只有这些子集：x
#lapply优化将j从“.SD”更改为“列表（y）”
#“旧平均值优化”处于启用状态，左侧j保持不变。
#使每组和运行j（GForce FALSE）。。。
#memcpy连续组为1个组花费了0.000s
#eval（j）1次呼叫花费0.000秒
#经过0.046s（0.268s cpu）
#空data.table（0行和2列）：x，y
dt[dt[，.I[.N==1]，by=（x），verbose=TRUE]$V1]
#检测到j使用以下列：
#正在使用forderv查找组。。。经过0.032s（0.033s cpu）
#从位置查找组大小（可以避免保存RAM）。。。0.033s（0.194s cpu）
#lapply优化已启用，j不变为'.I[.N==1]'
#G力开启，左j不变
#“旧平均值优化”处于启用状态，左侧j保持不变。
#使每组和运行j（GForce FALSE）。。。多组：从0行增长到#2行
#多组：从2行增长到4行
#写入的行（3）少于分配的行（4）。
#memcpy连续组5个组的时间为0.000s
#eval（j）5次通话耗时0.000秒
经过0.046s（0.273s cpu）
请在？data.table
的细节部分和基本部分中找到数据.table
语法的一般形式：“按i
获取DT，子集/重新排序行，然后计算j
按分组，因此，首先在i
中建立索引，然后在j
中计算/选择`in
。例如，在您的dt[.N==1，（y），by=.（x）]
中，您使用逻辑条件.N==1
在i
中第一个子集行.N
为11，因此其计算结果为FALSE
，并且在i
中选择零行。。。。。。然后，您尝试在j
中对这些零行执行“操作”。因此，可以根据需要在i
中使用.N
，但请记住数据的一般形式。表
语法：“按i
获取DT，子集/重新排序行，然后计算j
按分组。有关实际示例，请参阅中的.N
用法。谢谢@Henrik。看来答案是我不明白操作的顺序<代码>数据。表执行i
，然后执行by
，然后执行j
。因此，在i
中使用.N
不能反映中的内容。无论你在i
中输入什么，这都是正确的，不仅仅是.N
。只需继续阅读（长但）优秀的？data.table
，并使用简单的示例。另一个相关引用（来自by
参数）：“然后，data.table按by
分组，j
在每个组中进行评估。”-因此，是j
按组评估，而不是i。祝你好运干杯
dt[.N > 2, .(x)]

dt[.N ==1, .SD, by = .(x), verbose = TRUE]
#i clause present and columns used in by detected, only these subset: x 
#lapply optimization changed j from '.SD' to 'list(y)'
#Old mean optimization is on, left j unchanged.
#Making each group and running j (GForce FALSE) ... 
#  memcpy contiguous groups took 0.000s for 1 groups
#  eval(j) took 0.000s for 1 calls
#0.046s elapsed (0.268s cpu) 
#Empty data.table (0 rows and 2 cols): x,y

dt[dt[, .I[.N == 1], by = .(x), verbose = TRUE]$V1]
#Detected that j uses these columns: <none> 
#Finding groups using forderv ... 0.032s elapsed (0.033s cpu) 
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.033s #elapsed (0.194s cpu) 
#lapply optimization is on, j unchanged as '.I[.N == 1]'
#GForce is on, left j unchanged
#Old mean optimization is on, left j unchanged.
#Making each group and running j (GForce FALSE) ... dogroups: growing from 0 to #2 rows
#dogroups: growing from 2 to 4 rows
#Wrote less rows (3) than allocated (4).

#  memcpy contiguous groups took 0.000s for 5 groups
#  eval(j) took 0.000s for 5 calls
0.046s elapsed (0.273s cpu)