Regex 在左侧关闭时间间隔的情况下提取断点
我正在查看命令Regex 在左侧关闭时间间隔的情况下提取断点,regex,r,cut,Regex,R,Cut,我正在查看命令cut()(example(cut))的示例菜单,特别是这一部分: cut> aaa <- c(1,2,3,4,5,2,3,4,5,6,7) cut> cut(aaa, 3) [1] (0.994,3] (0.994,3] (3,5] (3,5] (3,5] (0.994,3] [7] (3,5] (3,5] (3,5] (5,7.01] (5,7.01] Levels: (0.994,3] (3,5] (5,
cut()
(example(cut)
)的示例菜单,特别是这一部分:
cut> aaa <- c(1,2,3,4,5,2,3,4,5,6,7)
cut> cut(aaa, 3)
[1] (0.994,3] (0.994,3] (3,5] (3,5] (3,5] (0.994,3]
[7] (3,5] (3,5] (3,5] (5,7.01] (5,7.01]
Levels: (0.994,3] (3,5] (5,7.01]
cut> cut(aaa, 3, dig.lab = 4, ordered = TRUE)
[1] (0.994,2.998] (0.994,2.998] (2.998,5.002] (2.998,5.002]
[5] (2.998,5.002] (0.994,2.998] (2.998,5.002] (2.998,5.002]
[9] (2.998,5.002] (5.002,7.006] (5.002,7.006]
Levels: (0.994,2.998] < (2.998,5.002] < (5.002,7.006]
cut> ## one way to extract the breakpoints
cut> labs <- levels(cut(aaa, 3))
cut> cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
cut+ upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
lower upper
[1,] 0.994 3.00
[2,] 3.000 5.00
[3,] 5.000 7.01
现在如何使用相同的命令
cbind()
提取断点?(如果有更多的方法,欢迎使用)只需在您的模式中使用以下内容,并使用gsub
:“\\[\\\\]\\\\(\124\\)”
举个例子
out <- levels(cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE))
gsub("\\[|\\]|\\(|\\)", "", out)
# [1] "0.994,2.998" "2.998,5.002" "5.002,7.006"
仅供参考:无论间隔是在左侧还是右侧闭合,都可以使用相同的模式。使用您最初的示例:
labs <- levels(cut(aaa, 3))
labs
# [1] "(0.994,3]" "(3,5]" "(5,7.01]"
read.csv(text = gsub("\\[|\\]|\\(|\\)", "", labs), header = FALSE)
# V1 V2
# 1 0.994 3.00
# 2 3.000 5.00
# 3 5.000 7.01
更新:一个完全不同的选择 很明显,R必须计算这些值并将其作为函数的一部分存储,以便生成您看到的输出,因此操纵函数以使其输出不同的内容并不太困难 查看
cut.default
的代码,您会发现最后几行是:
if (codes.only)
code
else factor(code, seq_along(labels), labels, ordered = ordered_result)
将最后几行更改为输出一个列表
,其中包含作为第一项的剪切
输出,以及计算范围(直接从剪切
函数中,而不是从粘贴在一起的因子
标签中提取)
例如,我将这些行更改如下:
if (codes.only)
FIN <- code
else FIN <- factor(code, seq_along(labels), labels, ordered = ordered_result)
list(output = FIN, ranges = data.frame(lower = ch.br[-nb], upper = ch.br[-1L]))
并且,right=FALSE
:
cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
# [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
# [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
# [11] [5.002,7.006)
# Levels: [0.994,2.998) < [2.998,5.002) < [5.002,7.006)
CUT(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
# $output
# [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
# [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
# [11] [5.002,7.006)
# Levels: [0.994,2.998) < [2.998,5.002) < [5.002,7.006)
# $ranges
# lower upper
# 1 0.994 2.998
# 2 2.998 5.002
# 3 5.002 7.006
cut(aaa,3,dig.lab=4,ordered=TRUE,right=FALSE)
# [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
# [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
# [11] [5.002,7.006)
#级别:[0.994,2.998)<[2.998,5.002)<[5.002,7.006)
切割(aaa,3,dig.lab=4,ordered=TRUE,right=FALSE)
#$output
# [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
# [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
# [11] [5.002,7.006)
#级别:[0.994,2.998)<[2.998,5.002)<[5.002,7.006)
#$ranges
#上下
# 1 0.994 2.998
# 2 2.998 5.002
# 3 5.002 7.006
很抱歉这么晚才发表评论,但现在我正在检查示例(cut.default)
第二种选择对我来说更舒服。谢谢!
if (codes.only)
code
else factor(code, seq_along(labels), labels, ordered = ordered_result)
if (codes.only)
FIN <- code
else FIN <- factor(code, seq_along(labels), labels, ordered = ordered_result)
list(output = FIN, ranges = data.frame(lower = ch.br[-nb], upper = ch.br[-1L]))
cut(aaa, 3)
# [1] (0.994,3] (0.994,3] (3,5] (3,5] (3,5] (0.994,3] (3,5] (3,5]
# [9] (3,5] (5,7.01] (5,7.01]
# Levels: (0.994,3] (3,5] (5,7.01]
CUT(aaa, 3)
# $output
# [1] (0.994,3] (0.994,3] (3,5] (3,5] (3,5] (0.994,3] (3,5] (3,5]
# [9] (3,5] (5,7.01] (5,7.01]
# Levels: (0.994,3] (3,5] (5,7.01]
#
# $ranges
# lower upper
# 1 0.994 3
# 2 3 5
# 3 5 7.01
cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
# [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
# [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
# [11] [5.002,7.006)
# Levels: [0.994,2.998) < [2.998,5.002) < [5.002,7.006)
CUT(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
# $output
# [1] [0.994,2.998) [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002)
# [6] [0.994,2.998) [2.998,5.002) [2.998,5.002) [2.998,5.002) [5.002,7.006)
# [11] [5.002,7.006)
# Levels: [0.994,2.998) < [2.998,5.002) < [5.002,7.006)
# $ranges
# lower upper
# 1 0.994 2.998
# 2 2.998 5.002
# 3 5.002 7.006