R 替换矩阵中的重复项_R_Matrix_Duplicates_Stringdist

R 替换矩阵中的重复项

r matrix

R 替换矩阵中的重复项,r,matrix,duplicates,stringdist,R,Matrix,Duplicates,Stringdist,我为您准备了以下测试代码： #####在这里测试 test=tibble:：tribble( ~Name1，~Name2，~Name3， “保罗·沃克”、“保罗·沃克”、“海科·纳普”， “费迪南德·巴斯”，“费迪南德·贝斯”，“迈克尔·赫尔” ) 图书馆（stringdist） gsub（“[（）c\“]”，“”中的输出错误，soundexspalten）：对象'soundexspalten'nicht gefunden soundexmatrix1=gsub（“0000”，soundexma

我为您准备了以下测试代码：

#####在这里测试
test=tibble:：tribble(
~Name1，~Name2，~Name3，
“保罗·沃克”、“保罗·沃克”、“海科·纳普”，
“费迪南德·巴斯”，“费迪南德·贝斯”，“迈克尔·赫尔”
)
图书馆（stringdist）
gsub（“[（）c\“]”，“”中的输出错误，soundexspalten）：对象'soundexspalten'nicht gefunden
soundexmatrix1=gsub（“0000”，soundexmatrix0）
#>gsub（“0000”和“”中的错误，soundexmatrix0）：对象“soundexmatrix0”不正确

由（v2.0.0）于2021-06-03创建

现在我想！！！用字符串“DUPLICATE”替换soundexmatrix1中的所有副本，这样矩阵的维数保持不变，并且可以立即看到所有副本

有什么办法吗？

感谢您的帮助！

要检查每行中是否有重复项（请参见更新），这应该可以实现您想要的，并且以更干净的方式：

# Feel free to load the packages you're using.
# library(stringdist)
# library(tibble)

test <- tibble::tribble(
  ~Name1,           ~Name2,           ~Name3,
  "Paul Walker",    "Paule Walkr",    "Heiko Knaup",
  "Ferdinand Bass", "Ferdinand Base", "Michael Herre"
)

# Get phonetic codes cleanly.
result <- as.matrix(apply(X = test, MARGIN = 2,
                          FUN = stringdist::phonetic, method = c("soundex"), useBytes = FALSE))

# Find all blank codes ("0000").
blanks <- result == "0000"

# # Find all duplicates, as compared across ENTIRE matrix; ignore blank codes.
# all_duplicates <- !blanks & duplicated(result, MARGIN = 0)

# Find duplicates, as compared within EACH ROW; ignore blank codes.
row_duplicates <- !blanks & t(apply(X = result, MARGIN = 1, FUN = duplicated))

# Replace blank codes ("0000") with blanks (""); and replace duplicates (found
# within rows) with "DUPLICATE".
result[blanks] <- ""
result[row_duplicates] <- "DUPLICATE"

# View result.
result

更新根据海报，我修改了代码，只在每一行中比较重复的代码，而不是在整个

结果

矩阵中进行比较

test <- tibble::tribble(
    ~Name1,           ~Name2,           ~Name3,
    "Paul Walker",    "Paule Walkr",    "Heiko Knaup",
    "Ferdinand Bass", "Ferdinand Base", "Michael Herre",
    "",               "01234 56789",    "Heiko Knaup"
# | ^^              | ^^^^^^^^^^^^^   | ^^^^^^^^^^^^^                   |
# | Coded as "0000" | Coded as "0000" | Duplicate in matrix, NOT in row |
)

您好，这是非常有帮助的，但我只想放置“复制”“如果在同一行中有一个副本，那么不是整个矩阵中的所有副本。啊，我想我正好有这个东西。给我几个小时。我假设你在中欧夏季，@MaxH。？Hi@MaxH.，我刚刚更新了代码，以便只比较行内的重复项。请不要添加更新，只需编辑以成为当时最好的帖子。如果某个问题被编辑为使合理答案无效，请将其回滚。

test <- tibble::tribble(
    ~Name1,           ~Name2,           ~Name3,
    "Paul Walker",    "Paule Walkr",    "Heiko Knaup",
    "Ferdinand Bass", "Ferdinand Base", "Michael Herre",
    "",               "01234 56789",    "Heiko Knaup"
# | ^^              | ^^^^^^^^^^^^^   | ^^^^^^^^^^^^^                   |
# | Coded as "0000" | Coded as "0000" | Duplicate in matrix, NOT in row |
)

     Name1  Name2       Name3 
[1,] "P442" "DUPLICATE" "H225"
[2,] "F635" "DUPLICATE" "M246"
[3,] ""     ""          "H225"