在r中查找重复值_R - Fatal编程技术网

在r中查找重复值

在r中查找重复值,r,R,因此，在包含多个1的字符串中现在有可能,这个数字, '1' 出现在多个位置，比如多个位置。我想要的是 (3) 或许以下路线会有所帮助：将字符串转换为整数字符向量 v <- as.integer(strsplit(s, "")[[1]]) 这不是一个完整的答案，但有一些想法（部分基于评论）：下面的代码完全符合您的要求。用str_组（'1101'）试试。它返回3个向量的列表。请注意，第一个三元组是（1，3，4），因为第10个位置的字符也是1 最终版本，优化且无错误我不知道你如

因此，在包含多个1的字符串中

现在有可能,这个数字,

'1'

出现在多个位置，比如多个位置。我想要的是

(3)

或许以下路线会有所帮助：

将字符串转换为整数字符向量

v <- as.integer(strsplit(s, "")[[1]])

这不是一个完整的答案，但有一些想法（部分基于评论）：

下面的代码完全符合您的要求。用

str_组（'1101'）

试试。它返回3个向量的列表。请注意，第一个三元组是（1，3，4），因为第10个位置的字符也是1

最终版本，优化且无错误

我不知道你如何定义哪个“1”是哪个组的一部分？你在假设固定跳跃吗？正如我所说，如果你把前两个

带到那里，它们就不满足

的最小重复规则。我已经编辑过了。实际上没有9个职位我现在没有时间做这个，但是你可以使用

acf

来识别显著的季节性。例如，

acf（c（作为数字（strsplit（“1101101”）和“）[[1]]））

将向您显示，1和4位的1会有更多的三胞胎吗？我还是不清楚。请为您的问题添加更多完整的示例！

rle

将如何帮助我

rle

基本上找到最长长度的编码。我不想那样。在连续两次

中，我也有间隔。我做到了

m@user3797829:您是否尝试过matrix（v，…）
，并按照步骤1准备了v
（这是一个序列，不是备选方案列表…）。我明白了您的意图。例如，当矩阵中的行数为1时，我得到的输出为，Run-Length Encoding Length:int[1:6]2 1 2 1值：int[1:6]1 0 1 0
，我也可以看到输出，但是，我将对许多行执行此操作。我想要一些逻辑，R可以自动分析三胞胎并给我它们（不一定是那种形式，其他形式也可以）。这是用于连续长度编码。也就是说，值1
出现2次，然后值0
出现1次。我想要一些能给我上述输出的东西。：）这不完全是我想要的，但很有帮助。非常感谢。：）这真是太棒了。非常感谢朱利安。这太神奇了。你应该得到比这更多的赏金。还有朱利安，如果你能给我一些关于降低代码复杂性的见解，那就太好了？谢谢。：）不客气。老实说，我不认为可以降低运行时的复杂性，因为任务本身就需要为列表中的每个元素查看列表的其余部分。标有####
的行已经用于避免重复工作，但复杂性的顺序仍然是二次的。（编辑）您可以通过首先生成一个包含所有1位位置的列表来对此进行优化。但是，代码会变得更复杂。我在考虑类似的问题，比如生成一个所有1的列表，并找到所有可能的算术级数。我试着这样做<代码>str_组（'110110'）
。我有4个三胞胎。现在，我有一个疑问。我得到的三胞胎中有两个是，135code>和163code>。现在，`1 3 5`对应于1
在1 4 7 10 13
的位置，`1 6 3`对应于1
在1 7 13的位置。因此，它们提供了相同的信息，这似乎是多余的。无论如何，我们可以删除这个？
m <- matrix(v, nrow=...)

rle(m[1, ]); rle(m[2, ]); ...

z <- "1101101101"
zz <- as.numeric(strsplit(z,"")[[1]])

a1 <- acf(zz)
first.peak <- which(diff(sign(diff(a1$acf[,,1])))==-2)[1]

ee <- embed(zz,first.peak)
pp <- apply(ee,1,paste,collapse="")
mm <- outer(pp,pp,"==")
aa <- apply(mm[!duplicated(mm),],1,which)
sapply(aa,length)  ## 3 3 2   ## number of repeats
sapply(aa,function(x) unique(diff(x)))  ## 3 3 3

str_groups <- function (s) {
    digits <- as.numeric(strsplit(s, '')[[1]])
    index1 <- which(digits == 1)
    len <- length(digits)
    back <- length(index1)
    if (back == 0) return(list())
    maxpitch <- (len - 1) %/% 2
    patterns <- matrix(0, len, maxpitch)
    result <- list()

    for (pitch in 1:maxpitch) {
        divisors <- which(pitch %% 1:(pitch %/% 2) == 0)
        while (index1[back] > len - 2 * pitch) {
            back <- back - 1
            if (back == 0) return(result)
        }
        for (startpos in index1[1:back]) {
            if (patterns[startpos, pitch] != 0) next
            pos <- seq(startpos, len, pitch)
            if (digits[pos[2]] != 1 || digits[pos[3]] != 1) next
            repeats <- length(pos)
            if (repeats > 3) for (i in 4:repeats) {
                if (digits[pos[i]] != 1) {
                    repeats <- i - 1
                    break
                }
            }
            continue <- F
            for (subpitch in divisors) {
                sublen <- patterns[startpos, subpitch]
                if (sublen > pitch / subpitch * (repeats - 1)) {
                    continue <- T
                    break
                }
            }
            if (continue) next
            for (i in 1:repeats) patterns[pos[i], pitch] <- repeats - i + 1
            result <- append(result, list(c(startpos, pitch, repeats)))
        }
    }

    return(result)
}

PROCEDURE str_groups WITH INPUT $s (a string of the form /(0|1)*/):
    digits := array containing the digits in $s
    index1 := positions of the digits in $s that are equal to 1
    len := pointer to last item in $digits
    back := pointer to last item in $index1
    IF there are no items in $index1, EXIT WITH empty list
    maxpitch := the greatest possible interval between 1-digits, given $len
    patterns := array with $len rows and $maxpitch columns, initially all zero
    result := array of triplets, initially empty

    FOR EACH possible $pitch FROM 1 TO $maxpitch:
        divisors := array of divisors of $pitch (including 1, excluding $pitch)
        UPDATE $back TO the last position at which a pattern could start;
            IF no such position remains, EXIT WITH result
        FOR EACH possible $startpos IN $index1 up to $back:
            IF $startpos is marked as part of a pattern, SKIP TO NEXT $startpos
            pos := possible positions of pattern members given $startpos, $pitch
            IF either the 2nd or 3rd $pos is not 1, SKIP TO NEXT $startpos
            repeats := the number of positions in $pos
            IF there are more than 3 positions in $pos THEN
                count how long the pattern continues
                UPDATE $repeats TO the length of the pattern
            END IF (more than 3 positions)
            FOR EACH possible $subpitch IN $divisors:
                check $patterns for pattern with interval $subpitch at $startpos
                IF such a pattern is found AND it envelopes the current pattern,
                    SKIP TO NEXT $startpos
                    (using helper variable $continue to cross two loop levels)
                END IF (pattern found)
            END FOR (subpitch)
            FOR EACH consecutive position IN the pattern:
                UPDATE $patterns at row of position and column of $pitch TO ...
                    ... the remaining length of the pattern at that position
            END FOR (position)
            APPEND the triplet ($startpos, $pitch, $repeats) TO $result
        END FOR (startpos)
    END FOR (pitch)

    EXIT WITH $result
END PROCEDURE (str_groups)