R 替换矩阵中的重复项
我为您准备了以下测试代码:R 替换矩阵中的重复项,r,matrix,duplicates,stringdist,R,Matrix,Duplicates,Stringdist,我为您准备了以下测试代码: #####在这里测试 test=tibble::tribble( ~Name1,~Name2,~Name3, “保罗·沃克”、“保罗·沃克”、“海科·纳普”, “费迪南德·巴斯”,“费迪南德·贝斯”,“迈克尔·赫尔” ) 图书馆(stringdist) gsub(“[()c\“]”,“”中的输出错误,soundexspalten):对象'soundexspalten'nicht gefunden soundexmatrix1=gsub(“0000”,soundexma
#####在这里测试
test=tibble::tribble(
~Name1,~Name2,~Name3,
“保罗·沃克”、“保罗·沃克”、“海科·纳普”,
“费迪南德·巴斯”,“费迪南德·贝斯”,“迈克尔·赫尔”
)
图书馆(stringdist)
gsub(“[()c\“]”,“”中的输出错误,soundexspalten):对象'soundexspalten'nicht gefunden
soundexmatrix1=gsub(“0000”,soundexmatrix0)
#>gsub(“0000”和“”中的错误,soundexmatrix0):对象“soundexmatrix0”不正确
由(v2.0.0)于2021-06-03创建
现在我想!!!用字符串“DUPLICATE”替换soundexmatrix1中的所有副本,这样矩阵的维数保持不变,并且可以立即看到所有副本
有什么办法吗?
感谢您的帮助!要检查每行中是否有重复项(请参见更新),这应该可以实现您想要的,并且以更干净的方式:
# Feel free to load the packages you're using.
# library(stringdist)
# library(tibble)
test <- tibble::tribble(
~Name1, ~Name2, ~Name3,
"Paul Walker", "Paule Walkr", "Heiko Knaup",
"Ferdinand Bass", "Ferdinand Base", "Michael Herre"
)
# Get phonetic codes cleanly.
result <- as.matrix(apply(X = test, MARGIN = 2,
FUN = stringdist::phonetic, method = c("soundex"), useBytes = FALSE))
# Find all blank codes ("0000").
blanks <- result == "0000"
# # Find all duplicates, as compared across ENTIRE matrix; ignore blank codes.
# all_duplicates <- !blanks & duplicated(result, MARGIN = 0)
# Find duplicates, as compared within EACH ROW; ignore blank codes.
row_duplicates <- !blanks & t(apply(X = result, MARGIN = 1, FUN = duplicated))
# Replace blank codes ("0000") with blanks (""); and replace duplicates (found
# within rows) with "DUPLICATE".
result[blanks] <- ""
result[row_duplicates] <- "DUPLICATE"
# View result.
result
更新
根据海报,我修改了代码,只在每一行中比较重复的代码,而不是在整个结果
矩阵中进行比较
test <- tibble::tribble(
~Name1, ~Name2, ~Name3,
"Paul Walker", "Paule Walkr", "Heiko Knaup",
"Ferdinand Bass", "Ferdinand Base", "Michael Herre",
"", "01234 56789", "Heiko Knaup"
# | ^^ | ^^^^^^^^^^^^^ | ^^^^^^^^^^^^^ |
# | Coded as "0000" | Coded as "0000" | Duplicate in matrix, NOT in row |
)
您好,这是非常有帮助的,但我只想放置“复制”“如果在同一行中有一个副本,那么不是整个矩阵中的所有副本。啊,我想我正好有这个东西。给我几个小时。我假设你在中欧夏季,@MaxH。?Hi@MaxH.,我刚刚更新了代码,以便只比较行内的重复项。请不要添加更新,只需编辑以成为当时最好的帖子。如果某个问题被编辑为使合理答案无效,请将其回滚。
test <- tibble::tribble(
~Name1, ~Name2, ~Name3,
"Paul Walker", "Paule Walkr", "Heiko Knaup",
"Ferdinand Bass", "Ferdinand Base", "Michael Herre",
"", "01234 56789", "Heiko Knaup"
# | ^^ | ^^^^^^^^^^^^^ | ^^^^^^^^^^^^^ |
# | Coded as "0000" | Coded as "0000" | Duplicate in matrix, NOT in row |
)
Name1 Name2 Name3
[1,] "P442" "DUPLICATE" "H225"
[2,] "F635" "DUPLICATE" "M246"
[3,] "" "" "H225"