R 需要找到最常见的字母组合
为了简单起见,假设我有10行5个字符,其中每个字符可以是A-Z 例如//R 需要找到最常见的字母组合,r,R,为了简单起见,假设我有10行5个字符,其中每个字符可以是A-Z 例如// KJGXI GDGQT JZKDC YOTQD SSDIQ PLUWC TORHC PFJSQ IIZMO BRPOJ WLMDX AZDIJ ARNUA JEXGA VFPIP GXOXM VIZEM TFVQJ OFNOG QFNJR ZGUBZ CCTMB HZPGV ORQTJ 我想知道哪种三个字母的组合最常见。但是,组合不需要有序,也不需要彼此相邻。例如 ABCXY CQDBA =ABC 我可能可以用无休
KJGXI
GDGQT
JZKDC
YOTQD
SSDIQ
PLUWC
TORHC
PFJSQ
IIZMO
BRPOJ
WLMDX
AZDIJ
ARNUA
JEXGA
VFPIP
GXOXM
VIZEM
TFVQJ
OFNOG
QFNJR
ZGUBZ
CCTMB
HZPGV
ORQTJ
我想知道哪种三个字母的组合最常见。但是,组合不需要有序,也不需要彼此相邻。例如
ABCXY
CQDBA
=ABC
我可能可以用无休止的循环来强制它,但我想知道是否有更好的方法 这里有一个解决方案:
x <- c("KJGXI", "GDGQT", "JZKDC", "YOTQD", "SSDIQ", "PLUWC", "TORHC", "PFJSQ", "IIZMO", "BRPOJ", "WLMDX", "AZDIJ",
"ARNUA", "JEXGA", "VFPIP", "GXOXM", "VIZEM", "TFVQJ", "OFNOG", "QFNJR", "ZGUBZ", "CCTMB", "HZPGV", "ORQTJ")
temp <- do.call(cbind, lapply(strsplit(x, ""), combn, m = 3))
temp <- apply(temp, 2, sort)
temp <- apply(temp, 2, paste0, collapse = "")
sort(table(temp), decreasing = TRUE)
有两个组合出现的时间最多
该守则的作用如下:
- 将
的每个元素拆分为五个字母,然后从这五个字母中生成三个元素的每个可能组合x
- 按字母顺序对这些组合进行排序
- 把这三个字母粘在一起
- 生成每个组合的计数,并对结果进行排序
combn
获得所有组合。使用paste0
将它们折叠回字符串并计数
txt <- c("KJGXI", "GDGQT", "JZKDC", "YOTQD", "SSDIQ", "PLUWC", "TORHC",
"PFJSQ", "IIZMO", "BRPOJ", "WLMDX", "AZDIJ", "ARNUA", "JEXGA",
"VFPIP", "GXOXM", "VIZEM", "TFVQJ", "OFNOG", "QFNJR", "ZGUBZ",
"CCTMB", "HZPGV", "ORQTJ")
txt2 <- strsplit(txt, split = "")
txt2 <- lapply(txt2, sort)
txt3 <- lapply(txt2, combn, m = 3)
txt4 <- lapply(txt3, function(x){apply(x, 2, paste0, collapse = "")})
table(unlist(txt4))
txt你能给我们看一下蛮力法吗?
txt <- c("KJGXI", "GDGQT", "JZKDC", "YOTQD", "SSDIQ", "PLUWC", "TORHC",
"PFJSQ", "IIZMO", "BRPOJ", "WLMDX", "AZDIJ", "ARNUA", "JEXGA",
"VFPIP", "GXOXM", "VIZEM", "TFVQJ", "OFNOG", "QFNJR", "ZGUBZ",
"CCTMB", "HZPGV", "ORQTJ")
txt2 <- strsplit(txt, split = "")
txt2 <- lapply(txt2, sort)
txt3 <- lapply(txt2, combn, m = 3)
txt4 <- lapply(txt3, function(x){apply(x, 2, paste0, collapse = "")})
table(unlist(txt4))